Migrate to Pydantic v2 - Githubissues

pederhan commented 1 year ago

Pydantic v2 brings with it much higher performance (5-50x times faster than Pydantic v1), as well as improved APIs for validation and (de)serialization.

This PR bumps the minimum Pydantic version of ZAC to 2.0.0. Code changes have been made to accommodate the new APIs introduced in 2.0.0. Most of the code changes were made using the migration tool Bump Pydantic as outlined in the Pydantic V2 Migration Guide.

The use of BaseSettings has been substituted with BaseModel since we didn't use any of the features offered by BaseSettings. Furthermore, the BaseSettings class has been moved to the pydantic-settings package, which we shouldn't include if we don't need to.

pederhan commented 1 year ago

Just discovered an issue with merging hosts using Pydantic V2.

89ae85d9031f87cc8900b24cc77e34c4585efe89 (origin/master)

Done with merge in 9.91 seconds. Equal hosts: 10067, replaced hosts: 0
Done with merge in 9.62 seconds. Equal hosts: 10067, replaced hosts: 0
Done with merge in 9.06 seconds. Equal hosts: 10067, replaced hosts: 0
Done with merge in 10.11 seconds. Equal hosts: 10067, replaced hosts: 0
Done with merge in 10.75 seconds. Equal hosts: 10067, replaced hosts: 0
Done with merge in 10.67 seconds. Equal hosts: 10067, replaced hosts: 0

f653bc3b40204f0d0669a7aead96a011fe54351f (pederhan/pydantic-v2)

Done with merge in 13.49 seconds. Equal hosts: 9742, replaced hosts: 324
Done with merge in 12.28 seconds. Equal hosts: 9737, replaced hosts: 330
Done with merge in 11.94 seconds. Equal hosts: 9744, replaced hosts: 324
Done with merge in 12.71 seconds. Equal hosts: 9744, replaced hosts: 324
Done with merge in 12.67 seconds. Equal hosts: 9744, replaced hosts: 324
Done with merge in 12.76 seconds. Equal hosts: 9744, replaced hosts: 324

Seems potentially like an issue with SourceMergerProcess and the comparison between the model created from the merged source model and the model in the hosts table:

https://github.com/unioslo/zabbix-auto-config/blob/f653bc3b40204f0d0669a7aead96a011fe54351f/zabbix_auto_config/processing.py#L436-L446

Where in Pydantic V1 the models would match, but in V2 they don't for whatever reason. Need to attach a debugger and investigate which models are affected. The number of models affected each merge iteration is very similar, so we should expect to see the same models affected each time. What do they have in common?

pederhan commented 1 year ago

This happens due to a host modifier assigning a new interface to the hosts in the form of a dict, not as an instance of models.Interface. In Pydantic V1, the equality check between the following two objects would be True:

from zabbix_auto_config.models import Host, Interface

host1 = Host(
    enabled=False,
    hostname='foo.example.com',
    interfaces=[],
    # ...
)
host1.append(
    {
        'details': {'version': 2, 'community': '{$SNMP_COMMUNITY}'},
        'endpoint': 'foo.example.com',
        'port': '161',
        'type': 2,
    }
)

host2 = Host(
    enabled=False,
    hostname='foo.example.com',
    interfaces=[
        {
            'details': {'version': 2, 'community': '{$SNMP_COMMUNITY}'},
            'endpoint': 'foo.example.com',
            'port': '161',
            'type': 2,
        }
    ],
)

assert host1 == host2

But in Pydantic V2, the list of dicts in host2 is not converted to a list of Interface objects during the comparison operation, thereby always returning False when comparing the two. Even calling Host.parse_obj(host2) will not yield an object where the list of dicts are converted to a list of Interface objects.

pederhan commented 1 year ago

After some more testing, it seems the only way to rebuild a model where submodels are re-validated in Pydantic V2 is to add revalidate_instances="always" to the model config:

class Host(BaseModel):
    # ...
    model_config = ConfigDict(validate_assignment=True, revalidate_instances="always")

Source: https://docs.pydantic.dev/latest/usage/model_config/#revalidate-instances

While that is a decent workaround, it ignores the fact that appending the wrong data type to a field is marked as an error by type checkers such as mypy, so it shouldn't even happen in the first place:

Without modifying the code to accommodate this erroneous use of the library, this PR would introduce a breaking change, but largely in the form of being less lenient with existing erroneous code. We are not introducing some arcane set of requirements or changing the "best practice" way of interacting with the models.

We could have a transition period to Pydantic V2 where do in fact re-validate each modification, but announce that we will eventually stop doing it? I'll add re-validation of Hosts for each modifier for now. But seeing as this problem is caught by static type checkers and we have to enable optional functionality (revalidate_instances) to even get validation to pass, it's a signal that this behavior should NOT be supported by us in the near future. This will introduce N_hosts * N_modifiers new validation calls, which can be quite a few... Need to test the performance of this change so we don't waste away all the Pydantic V2 performance gains.

pederhan commented 1 year ago

Need to test the performance of this chance so we don't waste away all the Pydantic V2 performance gains.

Statistic	V1 (seconds)	V2 Revalidation (seconds)	V2 No Revalidation (seconds)
Mean	7.56	4.51 (↓40.34%)	4.22 (↓44.18%)
Median	7.55	4.54 (↓39.87%)	4.23 (↓43.97%)
95th Percentile	7.60	5.04 (↓33.68%)	4.90 (↓35.53%)
99th Percentile	7.75	5.17 (↓33.29%)	5.21 (↓32.77%)
Range	0.34	0.97 (↑185.29%)	1.50 (↑341.18%)

Data

**V1** (See #63) | Statistic | Value (seconds) | |---------------------|----------------:| | Mean | 7.56 | | Median | 7.55 | | 95th Percentile | 7.60 | | 99th Percentile | 7.75 | | Range | 0.34 | ``` Done with merge in 7.75 seconds. Equal hosts: 10041 Done with merge in 7.60 seconds. Equal hosts: 10041 Done with merge in 7.51 seconds. Equal hosts: 10041 Done with merge in 7.41 seconds. Equal hosts: 10041 ``` **V2 Revalidation** | Statistic | Value (seconds) | |---------------------|----------------:| | Mean | 4.51 | | Median | 4.54 | | 95th Percentile | 5.04 | | 99th Percentile | 5.17 | | Range | 0.97 | ``` Done with merge in 4.76 seconds. Equal hosts: 10071 Done with merge in 5.04 seconds. Equal hosts: 10077 Done with merge in 4.20 seconds. Equal hosts: 10077 Done with merge in 4.62 seconds. Equal hosts: 10077 Done with merge in 4.39 seconds. Equal hosts: 10077 Done with merge in 4.32 seconds. Equal hosts: 10077 Done with merge in 4.64 seconds. Equal hosts: 10077 Done with merge in 4.76 seconds. Equal hosts: 10077 Done with merge in 4.54 seconds. Equal hosts: 10077 Done with merge in 4.38 seconds. Equal hosts: 10075 Done with merge in 4.57 seconds. Equal hosts: 10060 Done with merge in 4.24 seconds. Equal hosts: 10077 Done with merge in 4.60 seconds. Equal hosts: 10077 Done with merge in 4.21 seconds. Equal hosts: 10077 Done with merge in 4.31 seconds. Equal hosts: 10077 Done with merge in 4.47 seconds. Equal hosts: 10077 Done with merge in 4.75 seconds. Equal hosts: 10077 Done with merge in 5.17 seconds. Equal hosts: 10077 Done with merge in 4.22 seconds. Equal hosts: 10077 ``` **V2 No Revalidation (pederhan/zabbix-auto-config@408a9c3)** | Statistic | Value (seconds) | |---------------------|----------------:| | Mean | 4.22 | | Median | 4.23 | | 95th Percentile | 4.90 | | 99th Percentile | 5.21 | | Range | 1.50 | ``` Done with merge in 3.77 seconds. Equal hosts: 10069 Done with merge in 3.71 seconds. Equal hosts: 10074 Done with merge in 3.94 seconds. Equal hosts: 10074 Done with merge in 4.32 seconds. Equal hosts: 10074 Done with merge in 4.15 seconds. Equal hosts: 10074 Done with merge in 3.85 seconds. Equal hosts: 10074 Done with merge in 3.84 seconds. Equal hosts: 10074 Done with merge in 4.23 seconds. Equal hosts: 10074 Done with merge in 5.21 seconds. Equal hosts: 10074 Done with merge in 4.43 seconds. Equal hosts: 10069 Done with merge in 3.96 seconds. Equal hosts: 10075 Done with merge in 4.90 seconds. Equal hosts: 10075 Done with merge in 4.28 seconds. Equal hosts: 10075 Done with merge in 4.49 seconds. Equal hosts: 10075 Done with merge in 4.27 seconds. Equal hosts: 10075 Done with merge in 4.32 seconds. Equal hosts: 10075 Done with merge in 4.23 seconds. Equal hosts: 10075 Done with merge in 4.54 seconds. Equal hosts: 10075 Done with merge in 4.56 seconds. Equal hosts: 10074 Done with merge in 4.53 seconds. Equal hosts: 10063 Done with merge in 4.13 seconds. Equal hosts: 10078 Done with merge in 4.04 seconds. Equal hosts: 10078 Done with merge in 4.81 seconds. Equal hosts: 10078 Done with merge in 4.09 seconds. Equal hosts: 10078 Done with merge in 4.00 seconds. Equal hosts: 10078 ```

Conclusion

We see a ~40% speedup when doing revalidation with Pydantic V2 compared to the existing V1 implementation, while removing the revalidation puts this number up at ~44%. There aren't a lot of data points used to benchmark, so these are just rough estimates, but it gives us a good impression of the overall performance increase that Pydantic V2 provides as well as the cost of performing revalidation of hosts after each modification. The increase in range is likely due to external factors such as DB latency combined with few data points for the V1 benchmark.

In conclusion, the overhead incurred from revalidation is trivial when compared to other actions such as DB reads/writes. The repeat validation lets us maintain backwards compatibility with "improper" host modifiers while giving us more granular diagnostics in case a host modifier modifies a Host to be invalid, which we did not previously have.

unioslo / zabbix-auto-config

Migrate to Pydantic v2 #64

Conclusion