Closed aberaslop closed 3 years ago
Hi @aberaslop
This script was originally written for BUSCO v3 but looks like there may have been issues for v4 as the output directory structure changed. I pushed an update for the script BUSCO_phylogenomics.py
, the lineage parameter is no longer used.
This relies on having the correct directory structure. For example I have a directory that contains the BUSCO results for each genome of interest:
busco_results/
run_Fusarium_oxysporum/
busco_sequences/
single_copy_busco_sequences/
run_Fusarium_pseudograminearum/
busco_sequences/
single_copy_busco_sequences/
......
......
......
And run the script:
python BUSCO_phylogenomics.py -d busco_results -t 8 -o busco_phylogenomics --supermatrix
This should work but you may have to move the directories around a bit to get them into the correct structure
Hi @jamiemcg
thank you so much for your answer!
I have made sure that the file/folder structure matched your example,
busco_results/
run_Fox1/
busco_sequences/
single_copy_busco_sequences/
run_Fox2/
...
...
...
I also installed the latest release of BUSCO_phylogenomics and run the script as you suggested: python BUSCO_phylogenomics.py -d ../busco_results -t 8 -o busco_phylogenomics_results --supermatrix
The program starts running, I think, but it fails with a new error: FileNotFoundError: [Errno 2] No such file or directory: 'busco_sequences'
I have copied the whole terminal output in the following text file: output.txt
I am pretty sure that all folders have a busco_sequences subfolder. Is it possible that the problem comes from me using busco v5? Thanks again!
Thanks for sending the terminal output @aberaslop .
I'm not sure how BUSCO v5 has changed, I think it has the same directory structure so should be fine.
From the log, it seems to be working fine for the first 10 samples (i.e. it reports which BUSCOs were found) but crashes when it gets to run_Fusoxvas1
. Is it possible that this sample may be missing files or not formatted correctly?
Dear @jamiemcg,
you were totally right, that particular species was missing the busco_sequences folder. I do not know how that happened, as I run busco with a for loop script for all files at the same time...
In any case, that is fixed now.
I run python BUSCO_phylogenomics.py -d ../busco_results -t 8 -o busco_phylogenomics_results --supermatrix
and everything seems to be working fine, except that the IQTREE step is taking very long (more than 24hours). When I canceled the job, I got a message that the tree had been successfully built, and I could locate it in SUPERMATRIX.aln.tree, but no such file had been built.
I then tried to run the other script: python BUSCO_phylogenomics.py -d ../busco_results/ --supertree -t 16 -o output_supertree , which also seems to work fine, until the tree step that seems to take very long (16 hours now).
In your experience, is this normal, or has the analysis failed somewhere?
Thanks again!
Hi @jamiemcg, the analysis finally finished! It just took long, but everything is looking good. Thank you so much for helping me solve the different issues!!
Great!
If the supermatrix alignment is very long it is not surprising that IQTREE takes a long time. For future, you could take the supermatrix.aln
file and try other steps manually such as removing phylogenetically uninformative sites, manually choosing bootstrap/model selection methods, or use approximate ML methods such as FastTree, etc..
Glad to hear it worked for you.
Those are great tips. I try them all. Thanks!
Hi,
Thank you so much for this useful tool!
I have run BUSCO on a bunch of protein sets using hypocreales as lineage, and i have extracted from each result folder the run_hypocreales_odb10 folder. Since it was named identically for each separate run, I renamed them as run_nameofthespecies, and placed them in a the same folder. So i have as per the Readme.md:
BUSCO_phylogenomics_dataset run_species1 run_species2 run_species3 ...etc
I have run: python BUSCO_phylogenomics.py -d ../BUSCO_phylogenomic_dataset/ -o ../BUSCO_phylogenomic_dataset/output --supermatrix --threads 20 -l hypocreales_odb10
I am getting the following error: Traceback (most recent call last): File "BUSCO_phylogenomics.py", line 369, in
main()
File "BUSCOphylogenomics.py", line 122, in main
os.chdir("run" + lineage)
FileNotFoundError: [Errno 2] No such file or directory: 'run_hypocreales_odb10'
I have also tried to run it with -l hypocreales; in which case i get: FileNotFoundError: [Errno 2] No such file or directory: 'run_hypocreales'
I think this is related to issue #2. Any help will be very much appreciated.