Open ashotmarg opened 1 year ago
Whoops, sorry about that. It's possible that would work, although also possible circlify required python3. If that doesn't work, we should add python3 to the docker container, probably with apt install python3-dev
. Let me know what happens.
Thanks for the prompt reply! Yeah, I will try with 2.7 and let you know. But I guess in general it would be good to move to python3, i.e., change the anaconda base image, but not sure if the rest of the python2.7-dependent scripts will complain :)
Coincidentally, I think I'm finally starting the 2 to 3 conversion this week, and oh my goodness there will be complaining by probably mostly from me.
Hope things go smoothly with the transition :)
Hi again @psathyrella, apart from the python version issue it doesn't seem that the circlify (which is required for the circle-plots.py) is installed in the docker. When I tried to install it with a very old version of circlify==0.9.1, which should be compatible with python2.7 I got some versioning errors. Given that you are switching to the python3, I don't think it makes sense to troubleshoot this, and it's probably easier for the rest of us (apart from you :) ) to just switch to that new version, right?
Yeah that sounds good, I'll just have a look at this to make sure it's fixed after everything is on 3.
Hi @psathyrella, I was wondering if you have any updates re the python 3 version of partis.
Part way done, you can view progress here. It's running all the tests fine (i.e. basics are ok), but I still have some more checks to do.
Hi @psathyrella , I can see from the progress that you're still working on the new version. Just wondering if you have estimates when you will finish it? The very general command I want to run might even be ok with the current version, but I need to run it in the docker and I guess you will make it when you've finished everything, right ?
Whoops, was thinking of pinging you. I'm buried in a couple other things so I won't be able to finish the extra 2 to 3 checks I need to do til later in February, but yeah the current version should be fine for you. I just merged 2to3 to main, but unfortunately the docker file needs some modifications to work with python 3, and my first few tries aren't working (i can't run docker on my machine atm, i need to free up some space). On the off chance you're familiar with docker and you could suggest what I need to change here I can easily make the changes and push, otherwise it'll probably take me a few days (i've just swapped =2.7 for =3.10, which didn't work): https://github.com/psathyrella/partis/blob/main/Dockerfile
Sounds good, hope those extra few things won't create much headache. With my limited docker experience I managed to have a working (hopefully) version of the dockerfile. It took some fight with docker/partis, mostly due to my (i) company firewall with private SSL certificate issues and (ii) importantly the arm64 architecture on my mac. In the end I used another machine. In any case this worked on my x86_64 Mac OS Darwin. I had to add the .txt, to the Dockerfile's filename to attach it here, hope this is useful. Dockerfile.txt
Oh my goodness, wow, thanks! Wow, you certainly had to make a lot of modifications. It'll unfortunately I think still take me a bit to figure out which of those modifications are specific to your environment vs applicable to the github/quay build, but it seems like if you've got that built, you can then use the resulting local docker image to run what you need?
Yep, I haven't actually fully tested it, only saw that Partis run from the docker container itself, which was a good start :) So, assuming that (a) the docker tests run smoothly and (b) your last couple of remaining checks don't create issues, then I am good :) Thanks for the python 2 -> 3 transition.
Great. I should get to the docker file pretty soon, but it might take me a week or two to wrap up a couple other things first.
Hi @psathyrella , a quick update from my end.
When I run a basic partition command on the example dataset (that you provided) I get an error. This singularity file is based on the new docker that I made.
singularity exec ../code/partis.sif /partis/bin/partis partition --infname ../code/partis/test/example.fa --outfname outExample.yaml --parameter-dir outExampleDir --get-selection-metrics --extra-annotation-columns consensus_seq
... File "/partis/python/utils.py", line 7130, in write_yaml_output version_info = {'partis-yaml' : 0.1, 'partis-git' : '' if dont_write_git_info else get_version_info()} File "/partis/python/utils.py", line 7030, in get_version_info vinfo['commit'] = subprocess.check_output(['git', '--git-dir', git_dir, 'rev-parse', 'HEAD'], universal_newlines=True).strip() File "/opt/conda/lib/python3.10/subprocess.py", line 421, in check_output return run(popenargs, stdout=PIPE, timeout=timeout, check=True, File "/opt/conda/lib/python3.10/subprocess.py", line 503, in run with Popen(popenargs, **kwargs) as process: File "/opt/conda/lib/python3.10/subprocess.py", line 971, in init self._execute_child(args, executable, preexec_fn, close_fds, File "/opt/conda/lib/python3.10/subprocess.py", line 1863, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'git'
However, when I run the same command with the --dont-write-git-info
flag it runs fine.
singularity exec ../code/partis.sif /partis/bin/partis partition --dont-write-git-info --infname ../code/partis/test/example.fa --outfname outExample.yaml --parameter-dir outExampleDir --get-selection-metrics --extra-annotation-columns consensus_seq
I am guessing since --dont-write-git-info
does the trick you probably know where the issue is coming from, which is why I didn't paste the whole error trace.
Looks like git isn't installed in the singularity container, but yeah, as you discovered git isn't very necessary. It's just how it writes the partis version to the output file, which is important for reproducibility but not at all necessary.
Great, I will have a look at the docker soon. Since I mostly used your docker install commands, it might be our base images that are different in terms of whether git is already installed or not. In any case, the important part is that it's not affecting the outputs :)
Hi @psathyrella , when running our data with new (python3) and old (python2) Partis docker containers, we get somewhat different clustering results. We are using partis partition command with the same --random-seed value. Would you expect this outcome even if one uses the same --random-seed ? If not, then we'll try again based on a fasta file that we are allowed to share for troubleshooting, and add the details here for reference.
Yes, it's expected to get different results with different versions. Making even relatively minor code changes can alter random number generator state, and python 2 to 3 made tons of huge code changes. The most common source of the changes are that each clustering iteration apportions the sequences/clusters randomly among processes, which of course depends on random generator state.
They shouldn't be incompatibly different, but rather shuffled around within the clustering uncertainty. So if you look at the progression of partitions in the output (i.e. as hierarchical agglomeration proceeds) the difference between versions would typically correspond roughly to moving up or down a partition or two, depending on the log likelihood separating them (this is the same difference you'd expect from changing the random seed). This doesn't mean, though, that they'll have similar cluster sizes -- one uncertain merge can make a huge difference to cluster size, for instance.
Thanks for the details Duncan, I was naively assuming that giving the same seed might work even with different python versions, but as long as this behaviour is expected in compatible way, then all good :)
ok I think I've got the docker file working with the python 3 updates https://quay.io/repository/matsengrp/partis?tab=tags
Great, thanks for the update, are you by any chance also planning to create a biocontainers package along with the docker?
Sure, could do. After reading through their docs I'm a little unclear what I'd do beyond adding some comment and label lines to the docker file -- do the biocontainers people add it to the registry after we make a github issue, or do I add it?
Yeah, it is also confusing to me :). I guess the idea would be that partis could also be run from BioConda, for which one needs to build the conda package with a recipe (similar to the docker). If you want to stick with only docker (no conda) then I guess it would still be good to put it in biocontainers, in which case, it automatically should also be converted to a singularity image in the Galaxyproject as well: https://github.com/BioContainers/singularity-build-bot.
Re the docker, yeah, I think you need to open a github issue and take it from there: https://biocontainers-edu.readthedocs.io/en/master/contributing.html
Hi @psathyrella, I am getting a python issue when using the --plotdir in the docker container. Namely: circle-plots.py requires python3 (#!/usr/bin/env python3) but the docker has only python2.7. Do you think the circle-plots.py will work if I change the python version from 3 to 2.7 in the circle-plots.py script file? Thanks, Ashot