Closed elizabethmcd closed 6 years ago
I think I found the issue. I'm guessing you were using BuildGroups.py without the --use_MP
option?
I'm testing the fix right now and will push a new version hopefully later today.
I could be wrong, but I'm pretty sure I used the --use_MP
option. I have another colleague that's been testing the pipeline and I know he used the --use_MP
option and he also gets some empty directories from just running BuildGroups.py.
I think there's two different issues going on. Right now, if "--use_MP" is not flagged, the function that builds HMMs is never called. If it is flagged, it does build the HMMs but some directories will be empty if you use the "--clean" option.
Is there anything in the "all_groups.hmm" file for either you or your colleague?
That file is empty as well. I don't think we used the --clean
option either.
I pushed a new version to GitHub and PyPI with a bugfix for BuildGroups.py so upgrade to v0.2.2 and try running again.
Please use the --verbose
arg and attach the output
You will also have to make an empty file called prop_strainlist.txt
in your output directory before running IdentifyOrthologs.py if you don't run PropagateGroups.py - otherwise that part should work ok.
I'll leave this issue open but I may not have time to follow up until later in the week
Hi Ryan,
I updated to v0.2.2 and reran the BuildOrthologs.py
script. I am still getting empty directories. I've attached the output file, pypar-output.txt that I got with the --verbose
flag. I'm planning on running the PropagateGroups.py
script with a few new genome files that I have, so hopefully I can keep going with the analysis. I'll let you know if I have any problems with that.
Hi Elizabeth,
Thanks for attaching the output - looks like everything is going good until the final chunk of the pipeline which depends on the executables for several programs being installed.
Can you double-check that all of the executables required work correctly and are accessible in your $PATH? They would be:
cd-hit muscle hmmemit hmmbuild
If you used homebrew, be particularly careful about cd-hit. It only works on some versions of mac os x. Just try to run cd-hit from bash and see if you get an error message.
Ah I think that's it. I'm on a linux machine and cd-hit was put in a weird place. I'll try running it again.
Problem solved! There was a problem with cd-hit and my path, but it was a bit more interesting than it just not being in my path. Apparently when you install cd-hit with apt-get
, a hard link is created to the cd-hit executables as cdhit
without the dash. Everything is placed in my path, but since PyParanoid is calling cd-hit
with a dash, it couldn't find it. I suspect that possibly other Linux users that install things with apt-get
might get this issue, and it takes a while to figure out (at least it did for me). The fix was just creating a soft link to cd-hit
with a dash, and everything works perfectly. I ran IdentifyOrthologs.py
without propagating new groups and just renamed my strainlist as prop_strainlist and everything looks good there as well.
Awesome! Thanks for your detailed description and patience - I'll add more robust checking and error reporting for non-python dependencies in a future update.
Thanks for the awesome tool!
I have 70 Deltaproteobacteria genomes that I've run the
BuildGroups.py
script on, and have that working I think. However, some of the directoreis are empty after running this script such as thealigned
andhmms
directories in the masteroutfolder
. Are these empty until thePropagateGroups.py
script is run? I was trying to just go straight fromBuildGroups.py
to pulling out orthologs withIdentifyOrthologs.py
, but that depends on thehmms
directory, which is currently empty.Thanks!