rotary-genomics / rotary

Assembly/annotation workflow for Nanopore-based microbial genome data containing circular DNA elements
BSD 3-Clause "New" or "Revised" License
2 stars 1 forks source link

GTDB-Tk installation error #87

Closed jmtsuji closed 9 months ago

jmtsuji commented 9 months ago

Problem description

The first time rotary is run on a system, GTDB-Tk works fine. However, when rotary is re-run, if a re-install of the GTDB-Tk conda env is needed (e.g., due to a change in the env YAML file), GTDB-Tk will fail to run due to the GTDB root dir not being set.

Version: Working with the current develop branch (around commit 06e2be9) System: Linux (Ubuntu 22.04)

Possible cause

Rule setup_gdtb sets the GTDB root dir (GTDBTK_DATA_PATH) in the GTDB-Tk conda env. After this rule is run, it saves a checkpoint file in the rotary DB dir (in checkpoints/GTDB_[VERSION_GTDB_COMPLETE]_setup). Because the setup file is saved in the rotary DB dir and not in the individual run dir, it means that the setup rule will never be run again if that DB dir is used, even if a new GTDB-Tk conda env is installed. Thus, new GTDB-Tk conda envs will not have the GTDB root dir set.

Possible solutions

This problem should be pretty easy to address. Here are two possible solutions:

The only case where I could see one of these working better than the other is if a different program was trying to use the same GTDB-Tk conda env with a different GTDB dir. In that case, it would be better to set GTDBTK_DATA_PATH just before GTDB-Tk is run (solution 2 above) to avoid accidentally using the wrong GTDB. However, I imagine this kind of conflict would be very rare. It would be ideal if GTDBTK_DATA_PATH could be set as a flag of GTDB-Tk, but I don't see an option to do this.

@LeeBergstrand What's your favoured approach? Thanks!

jmtsuji commented 9 months ago

@LeeBergstrand I just made a PR (#88) for proposed fix 2 in the issue above. After thinking about it, I think this fix is better than fix 1, because:

Thoughts? Feel free to review my PR or to close it if you think a different solution would be better. Thanks!

LeeBergstrand commented 9 months ago

@jmtsuji I agree with your assessment. It simplify's the code.