psathyrella / partis

B- and T-cell receptor sequence annotation, simulation, clonal family and germline inference, and affinity prediction
GNU General Public License v3.0
57 stars 34 forks source link

Python version in docker: circle-plots.py #323

Open ashotmarg opened 1 year ago

ashotmarg commented 1 year ago

Hi @psathyrella, I am getting a python issue when using the --plotdir in the docker container. Namely: circle-plots.py requires python3 (#!/usr/bin/env python3) but the docker has only python2.7. Do you think the circle-plots.py will work if I change the python version from 3 to 2.7 in the circle-plots.py script file? Thanks, Ashot

psathyrella commented 1 year ago

Whoops, sorry about that. It's possible that would work, although also possible circlify required python3. If that doesn't work, we should add python3 to the docker container, probably with apt install python3-dev. Let me know what happens.

ashotmarg commented 1 year ago

Thanks for the prompt reply! Yeah, I will try with 2.7 and let you know. But I guess in general it would be good to move to python3, i.e., change the anaconda base image, but not sure if the rest of the python2.7-dependent scripts will complain :)

psathyrella commented 1 year ago

Coincidentally, I think I'm finally starting the 2 to 3 conversion this week, and oh my goodness there will be complaining by probably mostly from me.

ashotmarg commented 1 year ago

Hope things go smoothly with the transition :)

ashotmarg commented 1 year ago

Hi again @psathyrella, apart from the python version issue it doesn't seem that the circlify (which is required for the circle-plots.py) is installed in the docker. When I tried to install it with a very old version of circlify==0.9.1, which should be compatible with python2.7 I got some versioning errors. Given that you are switching to the python3, I don't think it makes sense to troubleshoot this, and it's probably easier for the rest of us (apart from you :) ) to just switch to that new version, right?

psathyrella commented 1 year ago

Yeah that sounds good, I'll just have a look at this to make sure it's fixed after everything is on 3.

ashotmarg commented 11 months ago

Hi @psathyrella, I was wondering if you have any updates re the python 3 version of partis.

psathyrella commented 11 months ago

Part way done, you can view progress here. It's running all the tests fine (i.e. basics are ok), but I still have some more checks to do.

ashotmarg commented 9 months ago

Hi @psathyrella , I can see from the progress that you're still working on the new version. Just wondering if you have estimates when you will finish it? The very general command I want to run might even be ok with the current version, but I need to run it in the docker and I guess you will make it when you've finished everything, right ?

psathyrella commented 9 months ago

Whoops, was thinking of pinging you. I'm buried in a couple other things so I won't be able to finish the extra 2 to 3 checks I need to do til later in February, but yeah the current version should be fine for you. I just merged 2to3 to main, but unfortunately the docker file needs some modifications to work with python 3, and my first few tries aren't working (i can't run docker on my machine atm, i need to free up some space). On the off chance you're familiar with docker and you could suggest what I need to change here I can easily make the changes and push, otherwise it'll probably take me a few days (i've just swapped =2.7 for =3.10, which didn't work): https://github.com/psathyrella/partis/blob/main/Dockerfile

ashotmarg commented 9 months ago

Sounds good, hope those extra few things won't create much headache. With my limited docker experience I managed to have a working (hopefully) version of the dockerfile. It took some fight with docker/partis, mostly due to my (i) company firewall with private SSL certificate issues and (ii) importantly the arm64 architecture on my mac. In the end I used another machine. In any case this worked on my x86_64 Mac OS Darwin. I had to add the .txt, to the Dockerfile's filename to attach it here, hope this is useful. Dockerfile.txt

psathyrella commented 9 months ago

Oh my goodness, wow, thanks! Wow, you certainly had to make a lot of modifications. It'll unfortunately I think still take me a bit to figure out which of those modifications are specific to your environment vs applicable to the github/quay build, but it seems like if you've got that built, you can then use the resulting local docker image to run what you need?

ashotmarg commented 9 months ago

Yep, I haven't actually fully tested it, only saw that Partis run from the docker container itself, which was a good start :) So, assuming that (a) the docker tests run smoothly and (b) your last couple of remaining checks don't create issues, then I am good :) Thanks for the python 2 -> 3 transition.

psathyrella commented 9 months ago

Great. I should get to the docker file pretty soon, but it might take me a week or two to wrap up a couple other things first.

ashotmarg commented 9 months ago

Hi @psathyrella , a quick update from my end.

When I run a basic partition command on the example dataset (that you provided) I get an error. This singularity file is based on the new docker that I made. singularity exec ../code/partis.sif /partis/bin/partis partition --infname ../code/partis/test/example.fa --outfname outExample.yaml --parameter-dir outExampleDir --get-selection-metrics --extra-annotation-columns consensus_seq

... File "/partis/python/utils.py", line 7130, in write_yaml_output version_info = {'partis-yaml' : 0.1, 'partis-git' : '' if dont_write_git_info else get_version_info()} File "/partis/python/utils.py", line 7030, in get_version_info vinfo['commit'] = subprocess.check_output(['git', '--git-dir', git_dir, 'rev-parse', 'HEAD'], universal_newlines=True).strip() File "/opt/conda/lib/python3.10/subprocess.py", line 421, in check_output return run(popenargs, stdout=PIPE, timeout=timeout, check=True, File "/opt/conda/lib/python3.10/subprocess.py", line 503, in run with Popen(popenargs, **kwargs) as process: File "/opt/conda/lib/python3.10/subprocess.py", line 971, in init self._execute_child(args, executable, preexec_fn, close_fds, File "/opt/conda/lib/python3.10/subprocess.py", line 1863, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'git'

However, when I run the same command with the --dont-write-git-info flag it runs fine. singularity exec ../code/partis.sif /partis/bin/partis partition --dont-write-git-info --infname ../code/partis/test/example.fa --outfname outExample.yaml --parameter-dir outExampleDir --get-selection-metrics --extra-annotation-columns consensus_seq I am guessing since --dont-write-git-info does the trick you probably know where the issue is coming from, which is why I didn't paste the whole error trace.

psathyrella commented 9 months ago

Looks like git isn't installed in the singularity container, but yeah, as you discovered git isn't very necessary. It's just how it writes the partis version to the output file, which is important for reproducibility but not at all necessary.

ashotmarg commented 9 months ago

Great, I will have a look at the docker soon. Since I mostly used your docker install commands, it might be our base images that are different in terms of whether git is already installed or not. In any case, the important part is that it's not affecting the outputs :)

ashotmarg commented 9 months ago

Hi @psathyrella , when running our data with new (python3) and old (python2) Partis docker containers, we get somewhat different clustering results. We are using partis partition command with the same --random-seed value. Would you expect this outcome even if one uses the same --random-seed ? If not, then we'll try again based on a fasta file that we are allowed to share for troubleshooting, and add the details here for reference.

psathyrella commented 9 months ago

Yes, it's expected to get different results with different versions. Making even relatively minor code changes can alter random number generator state, and python 2 to 3 made tons of huge code changes. The most common source of the changes are that each clustering iteration apportions the sequences/clusters randomly among processes, which of course depends on random generator state.

They shouldn't be incompatibly different, but rather shuffled around within the clustering uncertainty. So if you look at the progression of partitions in the output (i.e. as hierarchical agglomeration proceeds) the difference between versions would typically correspond roughly to moving up or down a partition or two, depending on the log likelihood separating them (this is the same difference you'd expect from changing the random seed). This doesn't mean, though, that they'll have similar cluster sizes -- one uncertain merge can make a huge difference to cluster size, for instance.

ashotmarg commented 9 months ago

Thanks for the details Duncan, I was naively assuming that giving the same seed might work even with different python versions, but as long as this behaviour is expected in compatible way, then all good :)

psathyrella commented 9 months ago

ok I think I've got the docker file working with the python 3 updates https://quay.io/repository/matsengrp/partis?tab=tags

ashotmarg commented 9 months ago

Great, thanks for the update, are you by any chance also planning to create a biocontainers package along with the docker?

psathyrella commented 8 months ago

Sure, could do. After reading through their docs I'm a little unclear what I'd do beyond adding some comment and label lines to the docker file -- do the biocontainers people add it to the registry after we make a github issue, or do I add it?

ashotmarg commented 8 months ago

Yeah, it is also confusing to me :). I guess the idea would be that partis could also be run from BioConda, for which one needs to build the conda package with a recipe (similar to the docker). If you want to stick with only docker (no conda) then I guess it would still be good to put it in biocontainers, in which case, it automatically should also be converted to a singularity image in the Galaxyproject as well: https://github.com/BioContainers/singularity-build-bot.

ashotmarg commented 8 months ago

Re the docker, yeah, I think you need to open a github issue and take it from there: https://biocontainers-edu.readthedocs.io/en/master/contributing.html