nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
321 stars 85 forks source link

Can predict.py use my output folder for workspace instead of the directory of my input genome file? #722

Open Mark-A-Taylor opened 2 years ago

Mark-A-Taylor commented 2 years ago

Folks,

Thank you VERY MUCH for this project. It has been incredibly valuable to us. I am using the tool in pipelines on a cloud platform (Illumina Connected Analytics) that is strict on file folder permissions. I only have write permissions on the '-o output' folder for a 'predict' command. However it looks like this tool is trying to create a folder in the directory it scrapes from the filepath of my input genome. This fails in my environment because I only have read permissions on the folder containing my input genome file. Conceptually I think it is 'non-intuitive' for this tool to assume that a user has write permission on the directory in the file path of his/her input genome file. I think it would be more logical to either use the '-o output' folder for workspace or accept another input argument specifying a temp folder.

Gory deets: I see predict.py:main() is calling library.py:checkMasklowMem() which creates a folder:

tmpdir = os.path.join(os.path.dirname(genome),'mask_'+str(uuid.uuid4()) os.makedirs(tmpdir)

The intention appears benign because this folder is later removed in the same function via a call to library.py:SaveRemove().

This is what I see in the dockerhub 'latest' tags as of today for both nextgenusfs/funannotate & nextgenusfs/funannotate-slim. I confirmed this was my issue by hacking the python code in my local instance of the docker container and rerunning the pipeline. Please consider enhancing this tool to work for my use case.

Thank you. Please shout if I can provide any additional details (or if I am misunderstanding something).

nextgenusfs commented 2 years ago

Shouldn't be hard to move that tmp folder elsewhere, but I'm away from computer for the next 4 days. Can you just symlink your genome to current working directory and then run it from there?

Mark-A-Taylor commented 2 years ago

Clever idea - but - unfortunatley the ICA (Illumina Connected Analytics) platform does not reveal that level of control to users. It is a GUI interface driving CWL configurations on the back end... I'll inquire with our ILMN tech support team to get their thoughts.

FWIW, it looks like the coding change on your end might be fairly isolated if you are open to it. Grepping inside the docker image (/venv/lib/python3.8/site-packages/funannotate/*) seems to show the only client of library.py:checkMasklowMem() is predict.py:main(). predict.py:main() has access to the output folder as 'args.out' which could be added as an input to library.py:checkMasklowMem() to allow it to redirect that tmp folder into a folder where we know the user has write perms. I would love to stay on your latest branch rather than maintain my bootleg hack.

Enjoy your break from the keyboard! Thanks again for sharing this great tool with the world.

Mark-A-Taylor commented 2 years ago

JP: Any further thoughts on this topic? I do think your code base benefits from an enhancement to remove the assumption that the user has write permission on the folder containing his/her input genome file. I am new to github but willing to try to help execute the change and do any testing you recc if you are open to it.

Meanwhile, FWIW, I have found a way to use sed to make do the edits in a local Dockerfile that builds around your image (below). This gives me a docker image I can use in ICA without the hassle of 'docker run', file edits inside the container and then a 'docker commit':

start with the blessed funannotate docker image

FROM nextgenusfs/funannotate

USER root

FIX the bug where Funannotate writes to folder of input file (not allowed on ICA platform)

RUN /bin/sed -i 's/checkMasklowMem(/checkMasklowMem( args.out, /g' /venv/lib/python3.8/site-packages/funannotate/predict.py RUN /bin/sed -i 's/def checkMasklowMem(/def checkMasklowMem( outputFolder, /g' /venv/lib/python3.8/site-packages/funannotate/library.py RUN /bin/sed -i 's/os.path.join(os.path.dirname(genome)/os.path.join(outputFolder/g' /venv/lib/python3.8/site-packages/funannotate/library.py

nextgenusfs commented 2 years ago

Okay, see if that works for you. There might be other errors that violate the "all writes must be in the output directory" -- so let me know if you find more.

Mark-A-Taylor commented 2 years ago

The code changes look great but the predict command is now failing for me when building off of the new image dockerhub tag 'nextgenusfs/funannotate:latest'. I can illustrate the failure using the 'funannotate test -t all' command inside the container. STDERR barks 'ERROR: augustus --proteinprofile test failed, likely a compilation error...'. I have attached output files for:

Shout if i can do anything else to help. It is not a crisis for me - I am able to revert back to dockerhub tag 'nextgenusfs/funannotate:v1.8.9' and my workaround above. Thanks again for working on it.

versions_output.txt test_output.txt .

nextgenusfs commented 2 years ago

I'm actively trying to fix another bug (that augustus compilation issue) on the latest docker image, it won't be stable for a little while, apologies.

Mark-A-Taylor commented 2 years ago

JP: Thanks for the heads up. FAVOR to ask if you have time while you are tweaking the dockerhub build image.. could you add perl packages MCE::Mutex and Math::Utils to make it easier for us to layer in Genemark? e.g /venv/bin/cpanm MCE::Mutex and /venv/bin/cpanm Math::Utils Thanks again and shout if I am useful as a guinnea pig for testing.

nextgenusfs commented 2 years ago

do those have conda packages? those must be new dependencies I guess as I have others in here for GeneMark. I just haven't used it in a long time so have not kept up to date with the perl dependencies.

nextgenusfs commented 2 years ago

Is this MCE::Mutex? https://anaconda.org/bioconda/perl-mce

Mark-A-Taylor commented 2 years ago

sorry but I am not sure. I am trying to get my head around genemark for the first time now. Those two packages (MCE::Mutex and Math::Utils) are checked as part of their 'check_install.bash' script. I'll dig into it more asap and pass on any 'insights' (if i have any).

nextgenusfs commented 2 years ago

Okay, well I just pushed those two perl modules in last commit, the docker image should have them when it finishes building in ~1 hour or so.

Mark-A-Taylor commented 2 years ago

Thanks!! Looks better - dockerhub image 'nextgenusfs/funannotate:latest' is crunching through 'funannotate test -t all' now. I'll let you know if I hit any issues or see anything fishy. Thanks again!

Mark-A-Taylor commented 2 years ago

heads up: it looks like the unit testing fails for BOTH the old and new dockerhub images: 'nextgenusfs/funannotate:v1.8.9' & 'nextgenusfs/funannotate:latest', resp, when grinding through 'funannotate test -t all' at the 'compare' test. Cut-n-paste from terminal window output is below.

Running funannotate compare unit testing Downloading: https://osf.io/7s9xh/download?version=1 Bytes: 1020999 CMD: funannotate compare -i Genome_one.gbk Genome_two.gbk Genome_three.gbk -o compare --cpus 8 --outgroup botrytis_cinerea.dikarya [May 26 09:02 AM]: OS: Debian GNU/Linux 10, 16 cores, ~ 33 GB RAM. Python: 3.8.13 [May 26 09:02 AM]: Running 1.8.10 [May 26 09:02 AM]: Now parsing 3 genomes [May 26 09:02 AM]: working on Genome one [May 26 09:02 AM]: working on Genome two [May 26 09:02 AM]: working on Genome three [May 26 09:02 AM]: No secondary metabolite annotations found [May 26 09:02 AM]: Summarizing PFAM domain results [May 26 09:02 AM]: Summarizing InterProScan results [May 26 09:02 AM]: Loading InterPro descriptions [May 26 09:02 AM]: Summarizing MEROPS protease results [May 26 09:02 AM]: found 4 MEROPS familes Traceback (most recent call last): File "/venv/bin/funannotate", line 8, in sys.exit(main()) File "/venv/lib/python3.8/site-packages/funannotate/funannotate.py", line 711, in main mod.main(arguments) File "/venv/lib/python3.8/site-packages/funannotate/compare.py", line 540, in main lib.drawHeatmap(meropsplot, 'BuPu', os.path.join( File "/venv/lib/python3.8/site-packages/funannotate/library.py", line 7921, in drawHeatmap cbar_ax = fig.add_axes(shrink=0.4) File "/venv/lib/python3.8/site-packages/matplotlib/figure.py", line 619, in add_axes raise TypeError( TypeError: add_axes() missing 1 required positional argument: 'rect' ERROR: funannotate compare test failed - check logfiles

nextgenusfs commented 2 years ago

Thanks, yes I know about that. That script is very very old and I should probably label it "unsupported", it needs a complete rewrite which I don't have time to do (support is all on my personal time).

Mark-A-Taylor commented 2 years ago

more useless information for you.... i mispoke: the compare test still works in the 'nextgenusfs/funannotate:v1.8.9' which uses matplotlib 3.4.2 (asot 3.5.2). matplotlib chirps that deprecation is coming:

"/venv/lib/python3.8/site-packages/funannotate/library.py:7921: MatplotlibDeprecationWarning: Calling add_axes() without argument is deprecated since 3.3 and will be removed two minor releases later. You may want to use add_subplot() instead. cbar_ax = fig.add_axes(shrink=0.4)"