jamiemcg / BUSCO_phylogenomics

BUSCO Phylogenomics | Utility script to construct species phylogenies using BUSCO proteins
MIT License
49 stars 7 forks source link

Error : No such file or directory exist. #3

Closed aberaslop closed 3 years ago

aberaslop commented 3 years ago

Hi,

Thank you so much for this useful tool!

I have run BUSCO on a bunch of protein sets using hypocreales as lineage, and i have extracted from each result folder the run_hypocreales_odb10 folder. Since it was named identically for each separate run, I renamed them as run_nameofthespecies, and placed them in a the same folder. So i have as per the Readme.md:

BUSCO_phylogenomics_dataset run_species1 run_species2 run_species3 ...etc

I have run: python BUSCO_phylogenomics.py -d ../BUSCO_phylogenomic_dataset/ -o ../BUSCO_phylogenomic_dataset/output --supermatrix --threads 20 -l hypocreales_odb10

I am getting the following error: Traceback (most recent call last): File "BUSCO_phylogenomics.py", line 369, in main() File "BUSCOphylogenomics.py", line 122, in main os.chdir("run" + lineage) FileNotFoundError: [Errno 2] No such file or directory: 'run_hypocreales_odb10'

I have also tried to run it with -l hypocreales; in which case i get: FileNotFoundError: [Errno 2] No such file or directory: 'run_hypocreales'

I think this is related to issue #2. Any help will be very much appreciated.

jamiemcg commented 3 years ago

Hi @aberaslop

This script was originally written for BUSCO v3 but looks like there may have been issues for v4 as the output directory structure changed. I pushed an update for the script BUSCO_phylogenomics.py, the lineage parameter is no longer used.

This relies on having the correct directory structure. For example I have a directory that contains the BUSCO results for each genome of interest:

busco_results/
   run_Fusarium_oxysporum/
      busco_sequences/
         single_copy_busco_sequences/
    run_Fusarium_pseudograminearum/
      busco_sequences/
         single_copy_busco_sequences/
   ......
   ......
   ......

And run the script:

python BUSCO_phylogenomics.py -d busco_results -t 8 -o busco_phylogenomics --supermatrix

This should work but you may have to move the directories around a bit to get them into the correct structure

aberaslop commented 3 years ago

Hi @jamiemcg thank you so much for your answer! I have made sure that the file/folder structure matched your example, busco_results/
run_Fox1/ busco_sequences/ single_copy_busco_sequences/ run_Fox2/ ... ... ...

I also installed the latest release of BUSCO_phylogenomics and run the script as you suggested: python BUSCO_phylogenomics.py -d ../busco_results -t 8 -o busco_phylogenomics_results --supermatrix

The program starts running, I think, but it fails with a new error: FileNotFoundError: [Errno 2] No such file or directory: 'busco_sequences'

I have copied the whole terminal output in the following text file: output.txt

I am pretty sure that all folders have a busco_sequences subfolder. Is it possible that the problem comes from me using busco v5? Thanks again!

jamiemcg commented 3 years ago

Thanks for sending the terminal output @aberaslop .

I'm not sure how BUSCO v5 has changed, I think it has the same directory structure so should be fine.

From the log, it seems to be working fine for the first 10 samples (i.e. it reports which BUSCOs were found) but crashes when it gets to run_Fusoxvas1. Is it possible that this sample may be missing files or not formatted correctly?

aberaslop commented 3 years ago

Dear @jamiemcg,

you were totally right, that particular species was missing the busco_sequences folder. I do not know how that happened, as I run busco with a for loop script for all files at the same time...
In any case, that is fixed now. I run python BUSCO_phylogenomics.py -d ../busco_results -t 8 -o busco_phylogenomics_results --supermatrix and everything seems to be working fine, except that the IQTREE step is taking very long (more than 24hours). When I canceled the job, I got a message that the tree had been successfully built, and I could locate it in SUPERMATRIX.aln.tree, but no such file had been built.

I then tried to run the other script: python BUSCO_phylogenomics.py -d ../busco_results/ --supertree -t 16 -o output_supertree , which also seems to work fine, until the tree step that seems to take very long (16 hours now).

In your experience, is this normal, or has the analysis failed somewhere?

Thanks again!

aberaslop commented 3 years ago

Hi @jamiemcg, the analysis finally finished! It just took long, but everything is looking good. Thank you so much for helping me solve the different issues!!

jamiemcg commented 3 years ago

Great!

If the supermatrix alignment is very long it is not surprising that IQTREE takes a long time. For future, you could take the supermatrix.aln file and try other steps manually such as removing phylogenetically uninformative sites, manually choosing bootstrap/model selection methods, or use approximate ML methods such as FastTree, etc..

Glad to hear it worked for you.

aberaslop commented 3 years ago

Those are great tips. I try them all. Thanks!