make_prg update PR series: 7. Misc changes

leoisl commented 1 year ago

This PR makes some unrelated, small misc changes:

Removed .dockerignore and Dockerfile as we will build a conda recipe so it is better to make it uniform the way we access make_prg through containers using just the biocontainer;
Updated the changelog;
Updated the readme;
Set version to 0.4.0 (before we were thinking on going 1.0.0, but I think this should now be in some versions, not this one);
Now not failing the whole execution if building one PRG was unsuccessful due to a sequence curation error;
Fixed minor issues to make make_prg compatible with python 3.7;
Removed nextflow script;
Added requirements file;
Updated sample example readme and script;
Added portable binary build scripts;
Updated setup.py;

Sorry for this big list of changes... the ideal would one PR for each, but most changes are trivial

mbhall88 commented 1 year ago

Okay I have to do some other work today. I have pushed a justfile for running common tasks - should be self explanatory.

One issue I ran into was with pygraphviz. There is a test that uses it and I was struggling to install it on my mac. Do we actually plot anything though?

Tests are failing on python 3.7 when comparing pickle objects and most the integration tests are failing on python 3.10 due to options being global I think. The correct solution for that seems to be not using a global... But we will still run into the pickle issue I suspect.

The main change I've made is switching from setup.py to pyproject.toml (via poetry). Not sure how that impacts building the binary but I suspect you might need poetry in the container you build from.

Not sure how familiar you are with it, but the setup is this

install poetry https://python-poetry.org/docs/#installation
create a conda env with the version of python you want to use for the project (and activate it)
change into the make prg directory and run just install
when you want to run make prg you prefix it with poetry run - like so poetry run make_prg [options]

I haven't formatted the project yet - I guess we do that when we are happy to merge so it doesn't impact the ability to review code changes.

I added all the dependencies that were explicitly imported in python files in the project - not everything that is listed in the requirements file. And pinned version in as relaxed a manner as I could.

leoisl commented 1 year ago

I have pushed a justfile for running common tasks - should be self explanatory.

Thanks, infrastructure is much cleaner now

One issue I ran into was with pygraphviz. There is a test that uses it and I was struggling to install it on my mac. Do we actually plot anything though?

It is required by networkx to draw graphs: https://github.com/iqbal-lab-org/make_prg/blob/7dda6096092016309aa4d7de26ffd52c47c44d33/make_prg/utils/recursive_tree_drawer.py#L53

I am now getting an error because networkx can't draw the graph because pygraphviz is not installed.

However, I actually think this functionality should be hidden. This is the parameter related to it:

  -g, --output-graphs   Outputs the recursive tree graphical representation
                        (for development use only)

This is actually just for me to debug the PRG builder recursive tree, as it is not simple to debug recursion with the common IDE environment. This tree helps a lot, and I will use it to debug for e.g. https://github.com/iqbal-lab-org/make_prg/issues/43 . However, for the user, it has no use, and it requires heavy dependencies: on ubuntu requires apt installing graphviz graphviz-dev. So I think we should:

Keep this feature in the make_prg code, because I will use it once in a while;
Remove its tests, because it is a simple big bang test, and I will use it just for debug purposes;
Remove the parameter associated to it, and this part of make_prg code is just invoked if an output env variable (e.g. make_prg_output_debug_graphs) is set;

Tests are failing on python 3.7 when comparing pickle objects

I am fixing this but it requires quite a good amount of change (need to make some recursive __eq__() and implement __hash__(). Might not be able to finish today.

most the integration tests are failing on python 3.10 due to options being global I think. The correct solution for that seems to be not using a global... But we will still run into the pickle issue I suspect.

Are we making fixing compatibility with 3.10?

And pinned version in as relaxed a manner as I could.

My tests don't run anymore, I get stuck just like in https://github.com/iqbal-lab-org/make_prg/issues/41

leoisl commented 1 year ago

Summary of changes I did today:

Downgraded scikit-learn to 0.24.2 otherwise I can't run tests (https://github.com/iqbal-lab-org/make_prg/issues/44);
Some refactoring (https://github.com/iqbal-lab-org/make_prg/pull/42/commits/3233aa696ddb67fa0129d9608f81c724c1635cef and https://github.com/iqbal-lab-org/make_prg/pull/42/commits/62126e66ea56430ebaee9efeb6e051247e3d81ba);
Tried to fix integration tests (https://github.com/iqbal-lab-org/make_prg/pull/42/commits/3acbdbf0a0a0c10bb2c6d8f246360780d195b81d and https://github.com/iqbal-lab-org/make_prg/pull/42/commits/0fa0a33da0b4b37744d6bf4d3b655506bdfd6c4b);
We just produce debug graphs if we set an env variable and also installs extra poetry deps (https://github.com/iqbal-lab-org/make_prg/pull/42/commits/988eb4a308dc72b75c1ab4da6048fbedcdacedba , https://github.com/iqbal-lab-org/make_prg/pull/42/commits/164812f1c80b9db9b1f3252726af7dbb82842767 and https://github.com/iqbal-lab-org/make_prg/pull/42/commits/e5f64a26541f750a14299d8ee8208a489f211195)

leoisl commented 1 year ago

TODO tomorrow:

[x] Add unit tests for the new code of https://github.com/iqbal-lab-org/make_prg/commit/3acbdbf0a0a0c10bb2c6d8f246360780d195b81d;
[x] Two tests not working on Michael's env: Test_From_MSA_Integration_Full_Builds.test___several___compressed_and_uncompressed_input_files and Test_From_MSA_Integration_Full_Builds.test___several_alignments

mbhall88 commented 1 year ago

So the changes I've made all relate to tooling/packaging.

We now support python versions 3.8+ and test for these on the CI (I've tested all four versions locally and they seem to work (apart from the two failing tests)). I'm still debugging the CI though - FML. EDIT I'm almost there, just need to install MAFFT manually. Basically the conda and poetry were'nt playing nice in the CI so I got rid of conda, so just need to install MAFFT in a non-conda way.

I have pinned scikit-learnt o ^1.0 again as I was only getting the test hang locally on version 3.7 and I got some weird CPython errors using 0.24.2 on python 3.10 on both mac and linux.

I've also removed the use of global variables and instead explicitly pass those variables around as needed (see https://github.com/iqbal-lab-org/make_prg/pull/42/commits/f03be8ec34ce32659243cf1d96f450b9f194570b)

You can also pass pytest options through just. So for verbose testing, you can do just test -v

leoisl commented 1 year ago

So in the previous commits I fixed the tests, all is passing now. However, scikit-learn = "~1.0" is still getting my tests hanging, so I downgraded to 0.24.2. After doing this in the CI branch, everything runs fine from python 3.7 to python3.11. On ubuntu runners, tests run in <10 seconds. On os x runners, it takes 1.5 minutes to run, still fine. However, it takes ~15m to install the project, which is weird but ok. In the end what we have is ubuntu runners completing CI in ~1m and os x runners in ~20 min. Then, in the scikit_1.0 branch, I just added two commits: 1) Removed the support for python 3.7; 2) upgraded scikit-learn to "~1.0". However, all ubuntu runners hanged on CI: https://github.com/leoisl/make_prg/actions/runs/3541011094 , I cancelled the workflow after waiting for 1 hour. The mac os runners ran just fine with similar runtime as using scikit-learn = "0.24.2", so this issue is specific to linux/ubuntu. So I think with this we should stick to scikit-learn = "0.24.2".

If you still want to try to make scikit-learn = "~1.0", please use the scikit_1.0 branch. If you manage to get ubuntu runners to finish CI there, then we can upgrade scikit-learn again

PS: in the CI branch, poetry.lock is not consistent with pyproject.toml. When synchronising, I get the expected incompatibility issues with python3.7. Removing support to python 3.7, now these files are consistent, and everything runs fine: https://github.com/leoisl/make_prg/actions/runs/3541812170 . So the above is still valid, just discard python 3.7 from it...

leoisl commented 1 year ago

Well, latest commits was just me fiddling with CI trying to upload the coverage to this PR, but I think I will postpone this and work on the things we need to finish this PR/release. Just two important points:

We exclude some files when computing coverage, explanation here: https://github.com/iqbal-lab-org/make_prg/pull/42/commits/a5f7429e98e7f563ece99bf91be863da48f78f57
CI fails if cov < 98%: https://github.com/iqbal-lab-org/make_prg/pull/42/commits/af2b175d6383d1307a2f45102a34fe1ba2b5ef6d

leoisl commented 1 year ago

Will finish last tests tomorrow, will bring coverage to 99%+

mbhall88 commented 1 year ago

Awesome work!! Happy with everything.

I added a coverage recipe to the just file that will open a html report in the browser. Also added a tag recipe which will gived you the git commands to run for tagging and pushing a release from git CLI.

I also removed the -vv default for pytest as it was killing me. I added -vv for the CI though, and you can obviously do it when running just test -vv

Also formatted and linted everything and added that back into the CI

And have added back an updated dockerfile and tested it builds okay locally. Once this PR is merged, it will auto build at https://quay.io/repository/iqballab/make_prg

leoisl commented 1 year ago

Thanks! I still need one or a couple of hours to finish all tests regarding the new and changed code, and to fix https://github.com/iqbal-lab-org/make_prg/issues/43 . Will add the latter as an integration test as well. If there are no more issues, we will have the release then by early next week.

leoisl commented 1 year ago

Recent commits summary:

Unit tests finished, coverage is now 99.73%;
Added pyinstaller to a poetry extra section (precompiled_binary). For this to work, I had to set python version from python = "^3.8" to python = ">=3.8,<=3.11". I hope this is fine @mbhall88 - we are losing python 3.12, but just realised that is not even released yet, still under dev;
Updated Dockerfile with a workaround to make pyinstaller work;
Simplified a lot the precompiled binary scripts: it basically uses the container built with the Dockerfile and just invokes pyinstaller;
Removed requirements.txt;

Please tell me if you find any issues @mbhall88

leoisl commented 1 year ago

Tomorrow I will fix the last outstanding issues:

Move update_shared_data to a singleton object;
https://github.com/iqbal-lab-org/make_prg/issues/43;

And then I think we are ready to merge and release

mbhall88 commented 1 year ago

All changes are fine by me!

leoisl commented 1 year ago

I am merging this and submitting new PRs for the outstanding issues, will be better for reviewing

iqbal-lab-org / make_prg

make_prg update PR series: 7. Misc changes #42