mandiant / capa

The FLARE team's open-source tool to identify capabilities in executable files.
Apache License 2.0
3.99k stars 499 forks source link

Update pydantic to 2.7.1 #2059

Closed mr-tz closed 2 months ago

mr-tz commented 2 months ago

I think any pydantic major updates were disabled in https://github.com/mandiant/capa/pull/1673, so hopefully this enables them again.

Checklist

uckelman-sf commented 2 months ago

Would you have a look at #2053 when dealing with upgrading pydantic? Presently, flare-capa is behind what flare-floss requires. If you upgrade to 2.7.1, you'll be upgrading past what flare-floss requires---which will continue to make the current versions of flare-capa and flare-floss incompatible with each other.

williballenthin commented 2 months ago

To @uckelman-sf point, I wonder if/how we should relax dependency versions to make it easier to use capa as a library. Since we currently pin each dependency down to the patch version, we force users that integrate capa as a library to use exactly the same version as we declare. This may not be compatible with all environments (eg. version not yet available in internal corporate environments, or conflicts with another library (like floss)).

We might consider locking the deps for when we do a standalone build, and perhaps our tests too. But we could relax the deps in pyproject.toml for library users. This would just specify min/max supported versions (or more like "you should use the max, but anything down to the min probably works too". We would not guarantee to test anything but the max).

I've seen other projects use requirements.txt to lock the deps for a reproducible build (generated with pip freeze or similar), and then pyproject.toml has dep specs more like pydantic>2.0.0,<=2.7.1.

I don't love the duplication of information and bit more complexity that we have to remember, so I think it's fine we didn't do this before. But now that users request relaxed deps, we should consider it.

We'll also have to consider how dependabot works with this setup. Can it fix up two places?

Thoughts @mr-tz @mike-hunhoff @uckelman-sf ?

williballenthin commented 2 months ago

(that probably should have been posted in #2053, sorry)

uckelman-sf commented 2 months ago

To @uckelman-sf point, I wonder if/how we should relax dependency versions to make it easier to use capa as a library. Since we currently pin each dependency down to the patch version, we force users that integrate capa as a library to use exactly the same version as we declare. This may not be compatible with all environments (eg. version not yet available in internal corporate environments, or conflicts with another library (like floss)).

We might consider locking the deps for when we do a standalone build, and perhaps our tests too. But we could relax the deps in pyproject.toml for library users. This would just specify min/max supported versions (or more like "you should use the max, but anything down to the min probably works too". We would not guarantee to test anything but the max).

There are in my opinion good arguments in favor of not setting upper bounds at all for libraries unless you know that newer versions don't work. (See https://iscinumpy.dev/post/bound-version-constraints/. The recommendation here is to do a patch release with an upper bound as soon as you have a known bad newer version of a dependency.)

Python doesn't give you any escape hatch when you need a version of some package P which conflicts with a constraint specified by one of your dependencies---but it does let you further constrain the versions of package P when you discover that you need to for some reason.

I wonder if the simplest thing would be to package the application and the library separately. That way, the application could depend on the library, but also pin the exact versions of dependencies that you want in its pyproject.toml, and then the library could provide only lower bounds for its own dependencies in the library's pyproject.toml.

mr-tz commented 2 months ago

I think relaxing makes sense per the linked article... @uckelman-sf can you propose these changes unless @williballenthin disagrees?!

williballenthin commented 2 months ago

great article @uckelman-sf very convincing.

For the initial cutover, should we set version mins based on each dep's current major version number? eg pydantic becomes pydantic>=2.0.0?

Also agree it would be nice to get @uckelman-sf credit if he's open to contributing a PR with the pyproject changes.

williballenthin commented 2 months ago

I wonder if the simplest thing would be to package the application and the library separately. That way, the application could depend on the library, but also pin the exact versions of dependencies that you want in its pyproject.toml, and then the library could provide only lower bounds for its own dependencies in the library's pyproject.toml.

I'm a little hesitant to split up the Python package at this point, especially if we can get by with putting the app lock specification in requirements.txt or similar. Let's investigate this in another thread and update CI with whatever we discover.

uckelman-sf commented 2 months ago

I will look into making a PR; maybe later today if I can, but if not then next week sometime. Thanks!