openzim / overview

:balloon: Start here for current projects, how to get involved, and joining community calls. A resource for new and veteran members of the offline commmunity
2 stars 1 forks source link

Switch to repeatable Python setups #16

Closed rgaudin closed 1 year ago

rgaudin commented 1 year ago

We are now sufficiently affected by sub-dependencies issues for it to be necessary.

Scrapers and other Python projects should all be switched from good-old requirements.txt to repeatable, frozen environments. Harmonizing how we build/publish (from python perspective, not publish workflows!) should probably be looked at as well.

I'd suggest using pipenv/Pipfile/Pipfile.lock but looking at what's the current recommended way is mandatory.

Note: this should be tackled with #13 and #14

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

rgaudin commented 1 year ago

I have eventually spent some time figuring this out.

TL;DR: https://github.com/openzim/overview/wiki/Python-Project-Bootstrap and https://github.com/rgaudin/bootstrap for a repo version ; if we agree on this workflow, we can move it to openzim/_python-bootstrap


The issue mentions repeatable setups and list tools. Of course, the issue is not a tooling issue but the result of years of lack of directive or openZIM convention.

Our main issue is that we explicitly, intentionally specified dependencies requirements as ranges, hoping that every developers down the tree would properly respect semantic versioning. Occasionally, this created an incident because some sub-sub-sub dependency got updated and something stopped working. Of course, this would happen when pushing some worry-less patch to one of our service and the CI would rebuild and we'd deploy and encounter a runtime error that needs to be fixed ASAP.

This example highlights various issues:

But this kind of problem would have not raised should we not allow updates to happen outside our control. And this is a direct developer responsibility.

So, to be very clear, we can continue to use a good old requirements.txt and be safe of those. We just need to write our dependencies properly.

But going over all our project's dependencies and fixing them is a tedious task that. We should thus use this opportunity to update our Python setup to modern standards (convention is more appropriate).


Answer to this ticket is thus a policy and a working example of what an openZIM python project should start from. Every project lives its life so this is just about the common expect-able blocks.

My criterias were:

I excluded pipenv because it can not work solely off pyproject.toml. It would thus need maintaining both the Pipfile and the pyproject.toml. I like its lock file but we can live without it. Remaining risks are:

I chose hatch:

Rest is pretty much self explanatory. @benoit74 @fledgeXu, we'd need your feedback on this. If you're OK with the general approach, we can discuss the details. If not, speak it up!

Ruff configuration (which rules we want to enforce) is one of those details. Can vary from project to project but we need a base.

As you are both starting new projects, those would use the new approach right away. Will help adjust it. I'll probably switch scraperlib as well and we'll use every dependency update opportunity to switch other projects.

I'd like to mention mypy (hint checking tool) which might be interesting down the road but I have no experience with it and I want us to be able to switch old repos to the new method easily.

kelson42 commented 1 year ago

@benoit74 @mgautierfr We are going to require this approach to all our Pythin project. If you have remarks, question or critics... this is the moment to speak out.

FledgeXu commented 1 year ago

I agree that we should exclude the pipenv. Here are some reasons:

  1. pipenv doesn't support the pyproject.toml natively. In nautilus-webui case, it makes updating dependencies is annoying, since I have to delete Pipfile.lock every time.
  2. pipenv's installation failed error output is unreadable and that makes debugging is terrible.

As for the package management, I don't have the any prefer and I think we should exclude the poetry too. Although poetry also uses the pyproject.toml, it has its own format AFAIK.

I have't tried Ruff before, so I have no comment about it. But I have experience with mypy, I personally used it as my auto-completion tool. It has better support for the pydantic than the pyright. In our case (FastAPI stack), it's a good choice.

benoit74 commented 1 year ago

I have only two comments regarding the general approach :

Congrats for everything else ! (I have some details that I do not understand, but this is a different topic) 👏

benoit74 commented 1 year ago

Oh, one last thing : I believe that we must decide how we will decide to update this 🚀

This is not a joke, because:

This does not have to be a complex thing, I would propose to simply open an issue in this openzim/overview project with details about the problem encountered, why we believe this is an issue, if possible what we propose to solve this, and then discuss it together.

mgautierfr commented 1 year ago

I would like to mention pip-tools which provide a "simple" way to manage dependencies and especially allow us to have a intentionally specified dependencies requirements as ranges and, in the same time, to lock the all dependencies to a specific version to ensure our deployment doesn't have unexpected dependency update.

It is not as full as hatch (which I don't know) but it is not its purpose neither.


How hatch behaves in regards of cython ? Poetry doesn't support it and you have, at best, to use a custom python script to run cython (https://yannlg.tech/blog/2022-07-23-cython-with-poetry).

rgaudin commented 1 year ago

I consider that we must have a type checking tool like mypyor pyright in the "openZIM standard"

I share this POV. We can be flexible with older code bases but should enforce type-hints on new projects.

I excluded it because I am not comfortable recommending anything at the moment. I have not used checking tools and while we've been using hints for a while now, it was only to our own human readability benefits. Maybe we should try it on a new project and add it to the convention once we're comfortable with something.

I would recommend to decide to really go multi-repo

Not discussing this here 😅 That's a different topic with different stakeholders and constraints. Please open another ticket with your arguments.

we must decide how we will decide to update this

I agree. If we go the repo way (openzim/_python-bootstrap for instance), then we can have tickets/PRs there and simply update the code in the repo. main would thus represent the latest recommendation. The least overhead the better IMO.

I would like to mention pip-tools

I had pip-tools in mind when writing that We could work around the lack of lock file. I know it can generate hash-based requirements and it works off pyproject.toml so it would be relatively painless to include when needed.

I'm not sure I see a use case for intentional ranges beside our libraries (which are by definition not deployed) other than CI tests though.

How hatch behaves in regards of cython ?

From what I've seen, a standardization of C-extension building is in progress and once/if this PEP gets accepted, other builder would support building extensions directly.

Regarding pylibzim (our only C-extension with no plan to expand), I suggest we harmonize the linling and and QA but keep the setuptools and setup.py because we have some complex stuff going on there and cython is still tied to setuptools.

The goal of all this is to simplify our work with no-brainer answers for common stuff. Doesn't prevent us from doing differently where we see fit.

kelson42 commented 1 year ago
  • It is not totally linked, but I would recommend to decide to really go multi-repo,

@benoit74 I would recommend to make your point in a dedicated ticket.

rgaudin commented 1 year ago

Closing this meta ticket after creating https://github.com/openzim/_python-bootstrap and some Wiki entries