openownership / lib-cove-bods

Check that your data complies with the Beneficial Ownership Data Standard (BODS) using our install our data review library to analyse files via your command line interface
https://datareview.openownership.org/
Other
1 stars 0 forks source link

Make this repo installable as a self-contained python package #40

Closed stevenday closed 4 years ago

stevenday commented 4 years ago

We use the libcovebods command line tool on the OpenOwnership register as part of our test process, to make sure the register is producing valid BODS data.

Previously, I'd installed it locally following the instructions in the README - cloning the repo, pip installing, etc. I was happy with that because it's a dev dependency, not a runtime one, and the only python thing the register used.

Now though, we have other python deps which are needed at runtime and so I've added a requirements.txt etc to the project to install those. I thought it would be nice to centralise all the dependencies by bringing libcovebods in too. On adding this repo to my requirements.txt (since it's not on PyPi) I discovered that it wouldn't actually run, due to libcove and flattentool not being in setup.py's install_requires, and the bundled schema not being included in the package.

I've made changes to add those in, and tested that it now works for our purposes (i.e. that I can pip install the package from github and run libovebods without any extra steps). I've also run the tests locally to check I didn't break anything with moving the schema, I added some basic documentation on how to do that at the same time.

What I don't know, is whether this will break something upstream that uses it - I couldn't see anything from git blame that explained why these steps were missed out before, but I presume there's a good reason. All that's to say, hopefully this is useful, but if not, please let me know why :)

stevenday commented 4 years ago

@Bjwebb @odscjames - I just picked one of you at random for a review of this, feel free to reassign as you see fit.

odscjames commented 4 years ago

Hello,

So libcove wasn't actually in PyPi before - it only got added a few weeks back. That's one reason it wasn't done before, but also honestly how we handled req's was a bit messy and now we are tidying it up.

So in general, no objection to this or the specific changes you've made.

I'm just pondering what should happen to the requirements* files in the root folder; at the very least the 2 lines can come out of requirements.in as they are now handled by setup.py.

However the version locking provided by requirements.* is a bit odd in a library and maybe should be taken out completely. Or if not, we should run update_requirements.sh to update and test. Can we take a short time to consider that - is this a blocker for you?

stevenday commented 4 years ago

I'm just pondering what should happen to the requirements files in the root folder; at the very least the 2 lines can come out of requirements.in as they are now handled by setup.py. However the version locking provided by requirements. is a bit odd in a library and maybe should be taken out completely. Or if not, we should run update_requirements.sh to update and test.

Yes, I didn't touch requirements.* because I didn't totally understand how they were being used, but I agree it seems at odds to keep both. In ua-edr-extractor where I had to make a similar PR, I just removed the requirements.txt entirely in favour of setup.py. Perhaps we could use something like the approach suggested in this: https://stackoverflow.com/a/28842733 (extra_requires and/or test_requires in setup.py) to get all of the dependencies into setup.py without polluting production uses.

Can we take a short time to consider that - is this a blocker for you?

Absolutely, I've pinned the register to the latest commit in this branch at the moment - as I'm happy it works for the register's purposes - I don't need it to get merged before we can start using it, although please keep the branch around :)

stevenday commented 4 years ago

@odscjames - I see there's been some new work on the piptools branch which relates to this. I've also had to push some a couple of fixes today to keep up with changes to lib-cove. Would you mind updating me on what the current thinking is w.r.t. this PR?

odscjames commented 4 years ago

Hello,

So removing requirements.in and requirements_dev.in and instead putting the contents of both in the 2 different fields of setup.py is the right thing to do. It will mean we as developers work with exactly the same set of requirements data as users of the library, and we can catch any errors there.

The question then is do we freeze requirements in requirements.txt and requirements_dev.txt at all, or do we just remove those files?

Remove: This mirrors how the library is used in production, in that we as developers can't dictate the exact mix of dependencies it will be installed against. (Frozen files are more useful in apps than libraries - in apps you can dictate exactly what should be installed)

But: This means that Travis builds aren't reproducible, as the exact versions of dependencies may change over time. This can cause frustration, as a developer takes a branch to do some work, find all their tests fail for no reason that is to do with their work, and then has to go on a side quest to fix them. (But: they would have to have been fixed anyway at some point, so maybe it's fine to force people on side quests like this)

I don't think there is a "right" answer here.

We have talked about using ways of locking to make Travis test with reproducible dependencies AND tell us if there are upgrades - I was experimenting with this in the "piptools" branch you've found.

OCDS has just gone down the route of having no lock files and thus no reproducible Travis builds and developers dealing with problems as they arise.

So that's the one question to answer before we can be clear about this; it doesn't have to be a answer for all time of course; we could start by removing the lock files and re-evaluate later.

Also, we can start putting this on Pypi - it doesn't take much and will make your and other people's use of this easier.

stevenday commented 4 years ago

Thanks for the explanation @odscjames. I'm in favour of removing the lock files, because I think the issue is not so much reproducibility as consistency. You presumably could reproduce locally any specific travis build, failing or otherwise, if you really had to. The problem is more that what worked this week might not work next.

Given that this isn't under very much development, I think it's better we find those limits at all, rather than working under the false assumption that it's working when it could be broken for other users (like I found with lib-cove last week). As you say, if it turns out to be causing lots of hassle later on, we can revisit.

Would you be happy for me to make these changes in this PR and then you review them?

odscjames commented 4 years ago

I tried this in https://github.com/openownership/lib-cove-bods/pull/54

Bjwebb commented 4 years ago

Continued in https://github.com/openownership/lib-cove-bods/pull/54 which is now merged.