qurator-spk / dinglehopper

An OCR evaluation tool
Apache License 2.0
59 stars 13 forks source link

Proposal to introduce version pinning and license checcking #57

Closed b2m closed 3 years ago

b2m commented 3 years ago

This pull requests introduces:

New CI features:

Pipelines showing different settings: https://app.circleci.com/pipelines/github/b2m/dinglehopper?branch=add-pip-licenses

Lessons learned:

As we now have a working example we can discuss how to proceed with this requirement in #54.

mikegerber commented 3 years ago

Is the version pinning strictly necessary for this?

mikegerber commented 3 years ago

As for Python 3.5: While poor librarians with ancient Python versions have my sympathies, I think we should just drop support for it as it's EOL for over half a year now.

b2m commented 3 years ago

Is the version pinning strictly necessary for this?

It depends...

Using version pinning we have reproducible builds and are sure that we checked the integrity and the licenses for this combination of libraries and versions. So we freeze this state. As a benefit we only have to run the license check when we have changes in versions or dependencies. (And it is easier to reproduce user reported bugs...)

If we do not use version pinning we just know, that the last time the CI build ran (successfully) everything was OK. But this might change any second when a new version of any library changes that we depend on (directly or indirectly).

So learning the hard way by ignoring version pinning before, my "best practice" perspective is that version pinning in combination with update alerts is absolutely recommendable.

But as with a lot of "best practices" they are not so much fun (might depend on the person =) and require extra efforts.

I was aiming to show a "best practice" example in this pull request and it is OK for me to remove the version pinning if you think it too much for this project.

mikegerber commented 3 years ago

So learning the hard way by ignoring version pinning before, my "best practice" perspective is that version pinning in combination with update alerts is absolutely recommendable.

What was the issue you had to learn the hard way? That would be interesting to know!

While I agree that this pinning is probably the 100% way of doing this, I would rather aim for the much simpler 99% way, also because we have the experience that practically every library has a good license and that python-Levensthein is a rare exception. I also do not expect an incompatible license change to sneak up on us and if it does I think we can detect it with a simple check.

b2m commented 3 years ago

What was the issue you had to learn the hard way? That would be interesting to know!

From the top of my mind here are some instances where not pinning proved to be problematic: (real life events):

And here are the problems I remember from using version pinning: