Upgrade Antlr to 4.11 and vendor the runtime

jlopezpena commented 1 year ago

Bumping antlr binary and antlr4-python3-runtime in order to unlock doing the same thing in hydra, following this discussion

Jasha10 commented 1 year ago

(CI lint failures are unrelated to this PR)

omry commented 1 year ago

@Jasha10, this is a breaking change (that would also require a parallel fix to Hydra.

News fragment.
I think it's really best if the effort is focused on fixing the root issue instead of bumping the version like this. In the mean time users with specific antlr version needs can build their own version of OmegaConf.

astrojuanlu commented 1 year ago

In the mean time users with specific antlr version needs can build their own version of OmegaConf.

How should users (like myself) do that, in a way that it works with various dependencies declaring omegaconf in their metadata?

(To keep the comment on-topic: from the peanut gallery, looks like this is effort well spent on a short term solution while folks address the root cause)

jlopezpena commented 1 year ago

@omry what is the root issue that needs to be fixed, and is there any way I can help with that?

I submitted the corresponding PR to hydra (in draft state as it requires a new version of omegaconf).

jlopezpena commented 1 year ago

How should users (like myself) do that, in a way that it works with various dependencies declaring omegaconf in their metadata?

@astrojuanlu You can do that by having your own dependency repository (kind of a mirrored version of PyPi) where you make specific builds of packages catered to your needs. Then when you declare dependencies you specify your mirror as the primary place to find dependencies and PyPi as a fallback, or explicitly declare dependencies to be pulled from your mirror.

Eg with poetry you can add something like this in your pyproject.toml:

[[tool.poetry.source]]
name = "PyPI"
priority = "default"

[[tool.poetry.source]]
name = "your_source"
priority = "supplemental"
url = URL_TO_YOUR_MIRROR

and when you declare a dependency instead of doing omegaconf = 2.3.0 you do something like omegaconf = {version = "2.3.0", source = "your_source"}

The problem with this approach is if you are working on a library that needs to be used by other people in their project, because then you are forcing THEM to also do the same thing (or you to vendor the modified package source for omegaconf and include it in the code you distribute)

astrojuanlu commented 1 year ago

The problem with this approach is if you are working on a library that needs to be used by other people in their project, because then you are forcing THEM to also do the same thing

Yeah that's what I had in mind, ideally we should be able to do this in a way that transitive dependencies don't have to cooperate. Otherwise I don't see how users can cope with the current situation effectively.

shchur commented 1 year ago

Hello, just wanted to clarify what the status of this PR is. Does the comment by @omry mean that this PR will be blocked and we will need to find another way to make omegaconf compatible with different antlr4-python3-runtime versions (without upgrading to antlr4-python3-runtime==4.11.*)?

If this is the case, I would be happy to help with finding the root cause and working on a PR to address it.

Jasha10 commented 1 year ago

I think it would be good to at least discuss @omry's points before merging.

@Jasha10, this is a breaking change

Yes, we are working on that: https://github.com/facebookresearch/hydra/pull/2733

News fragment.

Done! (commit https://github.com/omry/omegaconf/pull/1114/commits/bc4949b9a0995495a578c7712d24df764c4618f3)

I think it's really best if the effort is focused on fixing the root issue instead of bumping the version like this.

Any ideas for fixing the root issue? The antlr version is tightly coupled to the minor version of antlr4-python3-runtime.

In the mean time users with specific antlr version needs can build their own version of OmegaConf.

Here is a recipe for users who need to build their own version of OmegaConf+Hydra: https://github.com/facebookresearch/hydra/issues/2491#issuecomment-1344306280

My opinion is that we should go forward with this PR.

Not clear how to fix the root cause.
We've done PRs like this in the past (https://github.com/omry/omegaconf/pull/912 and https://github.com/facebookresearch/hydra/pull/2192)

odelalleau commented 1 year ago

I haven't looked at this in details, but would it be possible by any chance to make Hydra / OmegaConf compatible with multiple versions of antlr? (compiling the grammar during pip install instead of shipping the pre-compiled files)

Jasha10 commented 1 year ago

would it be possible by any chance to make Hydra / OmegaConf compatible with multiple versions of antlr? (compiling the grammar during pip install instead of shipping the pre-compiled files)

The current workflow for compiling the grammar invokes the following command:

java -jar build_helpers/bin/antlr-4.9.3-complete.jar -Dlanguage=Python3 -o omegaconf/grammar/gen -Xexact-output-dir -visitor omegaconf/grammar/grammar

This requires the antlr .jar to be present when the grammar is compiled.

If we compile the grammar during pip install then I think we would have to either (1) ship multiple antlr .jar files and select one of them at install time, or (2) expose an option for the user to provide their own antlr dependency.

Option (2) has been suggested before (here). I'm not sure specifically how we would expose such an option at pip install time -- maybe via an extras option?

natephysics commented 12 months ago

It just so happens that I've run across a dependency problem involving this exact issue as well. Seems like a tricky issue to solve.

omry commented 12 months ago

I think the best solution would be to "vendor" OmegaConf with a specific version of antlr, and have Hydra depend on it. That would require some rewriting some imports in the generated code and would essentially eliminate the external dependency on antlr.

I think I outlined it at some point as one of the options for a permanent fix for this issue.

jlopezpena commented 12 months ago

Could an option be to replace the anttrl4 java binary by antlr-tools?

omry commented 11 months ago

Could an option be to replace the anttrl4 java binary by antlr-tools?

I don't see how that would do anything. The antlr java binary is just a matter of convenience during the build of OmegaConf. The problem is that the antlr Python runtime only works with generated code that is of the exact same version used to generate the parser.

jlopezpena commented 11 months ago

I see, so the binary dependency is not runtime, but build-time only. I can think of two potential solutions to the bigger issue then.

One would be to build the grammars at install time. The downside of that is that the antlr binary would need to be downloaded at install time (probably through antlr-tools to avoid needing a system wide install of java). This would add size and time to install, as well as require an active internet collection.

The second approach would be to build multiple versions of the grammars at build time (through multiple versions of the antlr binary), and select the grammar at runtime based on the installed version of antlr4-python3-runtime . This would effectively decouple the build dependency from run time, at the cost of having additional grammar files distributed with the wheel and needing the (slightly) more complex logic for dispatching the correct version.

I believe the second option would be nicer from the user's point of view, with the tradeoff of having a more complex build process (which might or might not be acceptable for omegacong/hydra maintainers)

odelalleau commented 11 months ago

I think @omry's suggestion of shipping a specific version of antlr4-python3-runtime within OmegaConf may be the most robust approach.

omry commented 11 months ago

I see, so the binary dependency is not runtime, but build-time only.

Not quite. The antlr runtime is checking that it's working with code generated by a generator from the exact same version. This is a problematic design choice, as in reality antlr is very stable and they could be much laxer with this check - but I digress.

Other low-level libraries, like pip itself, solves the problem of dependency conflicts by optionally including their dependencies in under a different module. They call this vendoring (pip page about it). The idea is that by including your own version of the dependency, you isolate your dependencies from the dependencies of userland apps and eliminating these kinds of issues.

The complication is that Hydra should probably follow whatever OmegaConf is doing here.

jlopezpena commented 11 months ago

I understand the benefit of vendoring, the downside of that is that by vendoring a third party pakage you are left with the burden of incorporating security updates and whatnot yourself - which might or might not be a big ordeal depending on how often there are changes that need to be incorporated, and how complex updating the vendoring is (which might be as simple as running a git submodule update or a much more complex one). That's why in my own projects I tend to shy away from vendoring and normally opt for something similar to the second option I mentioned about (this is also the approach taken in many conda recipes that require binary linking against a specific version of numpy, resulting in a different "precompiled" binary for each python/numpy/architecture combination).

As mentioned, if there is anything I can do to help solve the issue let me know, the final decision on what the right solution is of couse falls onto the core developers, as they are the ones that will be left with the maintenance burden.

astrojuanlu commented 10 months ago

Is there a final decision on what the way forward should be? We are discovering more incompatibilities with other dependencies.

odelalleau commented 10 months ago

Is there a final decision on what the way forward should be? We are discovering more incompatibilities with other dependencies.

I think shipping our own version will be easier overall. We just need someone to actually implement it :)

astrojuanlu commented 10 months ago

I think shipping our own version will be easier overall.

I guess you mean @omry's vendoring idea from https://github.com/omry/omegaconf/pull/1114#issuecomment-1699656976

We just need someone to actually implement it :)

👍🏽 Thank you!

jlopezpena commented 10 months ago

I can change the PR to vendor antlr4-python3-runtime if that is the agreed approach, but I will not commit to maintain a vendored dependency.

For whatever is worth, I don't think vendoring is the right solution. It is an anti-pattern and introduces a significant burden to maintenance and a lot of friction for development. IMO it is only justified for packages like pip that need to be bootstrapped.

Jasha10 commented 10 months ago

For whatever is worth, I don't think vendoring is the right solution. It is an anti-pattern

@jlopezpena, thank you for sharing your opinion about this. It seems you are advocating for the second approach from your comment above: building "multiple versions of the grammars at build time (through multiple versions of the antlr binary), and select the grammar at runtime based on the installed version of antlr4-python3-runtime." I am concerned that this approach would make omegaconf's build and release process more difficult.

Currently, omegaconf's build & release process is very simple (here are the full instructions from the docstring of setup.py):

OmegaConf setup
    Instructions:
    # Build:
    rm -rf dist/ omegaconf.egg-info/
    python -m build
    # Upload:
    twine upload dist/*

I worry that the proposal to build multiple versions of the grammar would significantly complicate this process. Meanwhile, the vendoring appraoch could be as "simple" as copying the antlr4-python3-runtime source into omegaconf's source tree, allowing us to keep the same build procedure. Does this make sense to you?

by vendoring a third party pakage you are left with the burden of incorporating security updates and whatnot yourself - which might or might not be a big ordeal depending on how often there are changes that need to be incorporated.

Note that, with our current approach, we have not been updating our pin on the antlr4 runtime very frequently. Likewise, I do not anticipate that we would update a vendored version of the antlr4 runtime frequently.

jlopezpena commented 10 months ago

Meanwhile, the vendoring appraoch could be as "simple" as copying the antlr4-python3-runtime source into omegaconf's source tree, allowing us to keep the same build procedure. Does this make sense to you?

I don't think it will be that simple. It will also require editing the source of all the vendored antlr4-python3-runtime files to fix the import locations, otherwise if a user has an existing antlr4-python3-runtime installed (which is but a certainty given the reason many people are asking to update this is due to a dependency clash with other packages that use antlr) any internal imports from the vendored package might end up pulling (some) code from the user's antlr4 instead of the vendored one.

If done "manually" then any potential update will require doing the same process all over; the "right" way to do this would be to set up a script that would download (a specific version of) the runtime, copy it over the desired target folder (vendor/antlr4 or whatever) and then patch all the vendored files replacing all the imports, eg every from antlr4.FOO import BAR by from omegaconf.vendor.antlr4.FOO import BAR, every import antlr4 to import omegaconf.vendor.antlr4 as antlr4, etc etc, and ideally making sure the antlr4 tests still run after the patching, and potentially adding some additional integration tests that verify that nothing goes wrong when there is a different runtime version in user space that doesn't agree with the vendored one. Usually I'd expect having to go over the results of this process and do some additional tweaking to make things work, depending on how convoluted the code is.

Nothing in this process is particularly challenging or ground-breaking, but it normally is a tedious and error prone process

jlopezpena commented 10 months ago

I created a helper script get_vendored that will do the following:

Clear the omegaconf/vendor folder (apart from init file, requirements file, and readme)
pip source-install the source for whatever is in omegaconf/vendor/vendor.txt inside the omegaconf/vendor folder
Remove everything that comes from the pip source install that is not the source itself (metadata and bin folder)
Recursively parse the downloaded source files applying some regexes to modify all imports:
- import bar -> import foo.vendor.bar
- from bar -> from foo.vendor.bar

Running this script from top level (eg python build_helpers/get_vendored.py will get the source for whatever antlr4 version is specified in vendor.txt, and apply the transformations mentioned above.

Still to be done:

[x] Modify the imports in omegaconf to use the vendored antlr4
[x] Modify the grammar generation class so that the generated grammars import the vendored runtime
[x] Test that omegaconf still works
[ ] Test that nothing breaks when there is a different antlr4 installed
[x] Decide if we want to commit the antlr4 source files to the repo or get them downloaded and modified through the script at build time

@Jasha10 what is your opinion on the last one? I normally prefer not committing automatically generated code, but running the script at build time will require pip as well as network access, which could make it hard to run in sandboxed environments, so I think we could push the sources as well (with a big warning in the README.txt` file that those files get automatically created and modified by a script and should not be hand-edited)

jlopezpena commented 10 months ago

Test suite passes for me locally with vendored antlr4 and the patch to the parser generation class:

To get CI to run properly we need to decide whether we do the vendoring as part of the build or as a manual step that gets committed.

jlopezpena commented 10 months ago

Did a tentative commit pushing the vendored files to be able to test CI. There are many checks failing, mostly due to import errors that seem to come from a mix of absolute and relative imports within the codebase of omegaconf, which doesn't play nicely with the vendoring. I have tried some things with mixed results but didn't find a proper solution. Won't be able to work further on this for a day or two, in case someone wants to take over in the meantime

odelalleau commented 10 months ago

Thanks @jlopezpena, unfortunately no time for me to work on OmegaConf this week.

To get CI to run properly we need to decide whether we do the vendoring as part of the build or as a manual step that gets committed.

I would just do a manual commit, seems simpler that way.

Jasha10 commented 10 months ago

Decide if we want to commit the antlr4 source files to the repo or get them downloaded and modified through the script at build time @Jasha10 what is your opinion on the last one?

Thanks again for all of the discussion and for working on the lead with implementation, @jlopezpena. I agree with @odelalleau about committing the antlr4 source files to the repo.

jlopezpena commented 10 months ago

I think this is pretty much done. There are a bunch of flake8 errors in tests, but those seem to have nothing to do with this PR, and should be fixed somewhere else. @odelalleau , this seems related to https://github.com/omry/omegaconf/pull/1125

To minimise conflicts, I made all the imports in the vendored library relative (with some changes to the regexes in the get_vendored script to make that work).

I also excluded all the vendored files from CI checks like coverage, linting, mypy, etc, as those checks are not enforced in antlr4 and fixing the sources in here would make them diverge from the original for no real benefit.

@Jasha10 a new review would be a good idea, as this is now too different from the originally approved PR

Jasha10 commented 10 months ago

There are a bunch of flake8 errors in tests

Could you please rebase this PR against a recent version of the master branch? I believe those flake8 errors are fixed by https://github.com/omry/omegaconf/pull/1109, which was merged into master after you opened this PR.

jlopezpena commented 10 months ago

All green after rebasing!

jlopezpena commented 10 months ago

For reviewing purposes I recommend ignoring everything under omegaconf/vendor/antlr4, and check the rest of the files. Deleting the antlr4 folder and running the build_helpers/get_vendored.py script should recreate everything on that folder with no changes from what is on this PR

jlopezpena commented 10 months ago

Is there a way to tell CodeQL to ignore the omegaconf/vendor folder?

Jasha10 commented 10 months ago

Not sure -- I don't know a lot about how to configure CodeQL. Anyway, I've gone through and manually dismissed the CodeQL alerts related to the vendor folder.

jlopezpena commented 10 months ago

I feel there isn't much else to do with this PR apart from merging, but there are some items that should be discussed with the view of moving this forward and unblock the dependency lockdown (DVC now includes hydra, so I am getting hit by this even in projects where I don't explicitly use hydra myself now).

The vendoring will take care of ensuring that the correct runtime will be used to load the grammars, but there are still some steps that would be either nice to have or required to move forward:

The antlr4 binary jar file is added to the repo manually. One needs to make sure that the binary file present agrees with the runtime version listed in vendor.txt in this PR. Ideally, the get_vendored script would also pull the jar binary file to ensure the binary and runtime versions cannot get out of sync. Some inspiration for that can be taken from antlr4-tools (probably not worth pulling the entire dependency)
In order to ensure binary and runtime consistency between omegaconf and hydra some further work needs to be done in both projects. hydra also needs to run the antlr4 binary at build time to create its parsers, and the runtime used needs to agree with the version used there AND the version used in omegaconf. A potential solution here is to include the antlr4 jar file in the omegaconf wheel (maybe not exposing it directly, but providing a script entrypoint for calling it), and let hydra use that script/binary instead of its own antlr4 binary, and then leverage omegaconf's vendored runtime instead of adding the dependency again.

For the second point, the proposed solution probably entails making omegaconf a hard-versioned dependency of hydra. Going forward this will mean that each time omegaconf gets an update that touches antlr4, a new version of hydra pinning the new release as a minimum version will need to be done as well. Because changes in antlr4 binary/runtime will be API breaking (-ish) for hydra, to ensure good semantic versioning omegaconf might want to adopt a policy requiring that any changes to the antlr binary/runtime must be released with a major version bump (i.e. the current PR should be released as omegaconf 3.0, not 2.4).

I believe hydra and omegaconf versions are heavily interlocked already, so hopefully this will not bring much of an additional maintenance burden (albeit whenever vendoring some pain is unavoidable).

PS: Adding the antlr binary to the omegaconf wheel will increase wheel size by about 10MB, and the binary will be unsuable unless a proper Java runtime is present. The increased wheel size might be an inconvenience for third-party projects that depend on hydra/omegaconf so perhaps it would be a good idea to make it into an optional dependency that third party libraries can opt-out of if desired (hydra would still need to pull the full wheel at build time)

odelalleau commented 10 months ago

Because changes in antlr4 binary/runtime will be API breaking (-ish) for hydra, to ensure good semantic versioning omegaconf might want to adopt a policy requiring that any changes to the antlr binary/runtime must be released with a major version bump (i.e. the current PR should be released as omegaconf 3.0, not 2.4).

I believe hydra and omegaconf versions are heavily interlocked already, so hopefully this will not bring much of an additional maintenance burden (albeit whenever vendoring some pain is unavoidable).

Yeah that makes sense, and it should be fine. Note however that:

Technically Hydra could ship its own vendored version of antlr runtime. But I don't think it's worth the extra trouble at first sight (we can still do it later if needed).
Hydra / OmegaConf have never followed semantic versioning: breaking changes are typically introduced in X.Y+1 versions.
I'm not sure bumping the antlr runtime version in OmegaConf should be considered a breaking change, even if it breaks Hydra: Hydra is a "special" case in the sense that it's tightly coupled with OmegaConf, but no other library should be relying on OmegaConf's internal antlr runtime version. As long as we properly set Hydra's dependencies, there's no strong reason to consider a new antlr runtime in OmegaConf "breaking" in terms of user facing release notes.

Jasha10 commented 9 months ago

what is the plan to address this in Hydra?

In Hydra we can change import antlr4 to from omegaconf.vendor import antlr4.

jlopezpena commented 9 months ago

what is the plan to address this in Hydra?

In Hydra we can change import antlr4 to from omegaconf.vendor import antlr4.

There is still a potential issue in that we need to make sure the binary antlr jar in hydra needs to be the same version as the vendored runtime in omegaconf (but AFAIK that is already the case anyway)

omry commented 9 months ago

what is the plan to address this in Hydra?

In Hydra we can change import antlr4 to from omegaconf.vendor import antlr4.

There is still a potential issue in that we need to make sure the binary antlr jar in hydra needs to be the same version as the vendored runtime in omegaconf (but AFAIK that is already the case anyway)

The build of Hydra would need to change similarly to the build of OmegaConf to rewrite the generated code as well.

jlopezpena commented 9 months ago

The build of Hydra would need to change similarly to the build of OmegaConf to rewrite the generated code as well.

I can either port the regexes that to the replacements to the grammars (the fix_import method I added) to hydra, or refactor the one in omegaconf and leverage that from hydra; since all the replacements are done programatically, implementation will be easy either way. I will wait to update the hydra PR until this one settles, to avoid having to work any potential changes across both repos.

What I don't have a good solution for is making sure the antlr4 java binary (which is used at build time, but not distributed in the wheel) stays the same for both packages. The only way I can think of enforcing that would be to download the jar file at build time, which comes with its whole set of complications and downsides.

But since this issue is already present anyway, and this PR is not making it any worse than it already is, I'd rather leave that for future improvements

omry commented 9 months ago

Sounds good. For simplicity, I think you can copy the code. No need to include the build code related code with OmegaConf for that purpose. The way to control OmegaConf dependency in Hydra (and by extension, it's dependencies) is by pretty strict version dependency of OmegaConf from Hydra.

https://github.com/facebookresearch/hydra/blob/main/requirements/requirements.txt#L1

mawildoer commented 9 months ago

Hey folks! Thanks for the upgrade and vendoring! I'm running into a similar issue to hyrda on a seperate project with antlr dependencies.

I've been mostly successfully using this branch today, however I've hit a few snags trying to install it.

Mainly, if I use anything other than an editable installation, I get this import error:

ModuleNotFoundError: No module named 'omegaconf.vendor.antlr4.error'

I've tried:

python setup.py build + pip install .
python setup.py sdist + pip install dist/omegaconf-2.4.0.dev1.tar.gz
python setup.py antlr + pip install -e . (with only this option succeeding)

I'm not sure I've successfully emulated what will happen during an actual CI build, but I have checked and confirms that the subdirectories of the omegaconf/vendor we not present in the install.

I didn't see them in the checks, but is there a package I could manually install from CI while still on this branch?

Thanks!

astrojuanlu commented 9 months ago

I don't know if it's related but note that direct setup.py invokation on the CLI is deprecated, see https://packaging.python.org/en/latest/discussions/setup-py-deprecated/#what-commands-should-be-used-instead

jlopezpena commented 9 months ago

~Ah, I think the setup.py might be using explicit package listing instead of auto-discovery. I will add the vendored folder to the package list, see if that helps~ Scratch that, I had already added "omegaconf.vendor" anf "omegaconf.vendor.antlr4" to the package list in setup.py, so they should be discoverable

jlopezpena commented 9 months ago

I can reproduce the problem. The files in vendor/antlr4 are getting copied over, but folders inside that are not. Will try to work on a fix later today or tomorrow. Thanks for flagging it up @mawildoer !

jlopezpena commented 9 months ago

@mawildoer the non-editable installs should work now, could you give it a check?

@Jasha10, @omry it looks like CI is only testing editable installs, perhaps a non-editable scenario should be added as well (in a separate PR)

mawildoer commented 9 months ago

I don't know if it's related but note that direct setup.py invokation on the CLI is deprecated, see https://packaging.python.org/en/latest/discussions/setup-py-deprecated/#what-commands-should-be-used-instead

TIL; those are the old-school alias for the -e we know and love! I've only ever used "speciality" setup.py commands once before, for something more analogous to the antlr case there (eg. doing something special). I just saw them this time in the setup.py and figured it was worth giving a whirl, since other things weren't working yet. Thanks for the comment!

@mawildoer the non-editable installs should work now, could you give it a check?

Yessir! Tested both pip install -e . and pip install .. Both working 🎉 w/ full IDE/pyright/pylint etc... finding them too.

Thanks for the quick response! I'll have to package and distribute your branch if we can't get this back to mainline soon 😅

Thanks for all the work!

Jasha10 commented 9 months ago

@jlopezpena I've just uploaded omegaconf dev release 2.4.0.dev1 to pypi based on this PR branch. This should be useful so there's something to pin against when working on the Hydra PR. https://pypi.org/project/omegaconf/2.4.0.dev1/

pip install omegaconf==2.4.0.dev1

jlopezpena commented 8 months ago

@jlopezpena I've just uploaded omegaconf dev release 2.4.0.dev1 to pypi based on this PR branch. This should be useful so there's something to pin against when working on the Hydra PR.

@Jasha10 excellent, thank you very much! I made a commit to the hydra PR including the regex import fixing for the generated parsers, will change it to use the 2.4.0.dev1 version of omegaconf and start testing on Monday.

omry / omegaconf

Upgrade Antlr to 4.11 and vendor the runtime #1114