jasonsahl / LS-BSR

Large scale Blast Score Ratio (BSR) analysis
GNU General Public License v3.0
38 stars 17 forks source link

FileNotFoundError: Errno 2 No such file or directory: 'duplicate_ids.txt' #30

Closed yzhzu closed 3 years ago

yzhzu commented 3 years ago

Dear all: today, I try to use ls-bsr to perform several bacterial genomes. however, the error occurs, can someone help me? thank you very much! the following error information: python ~/biosoft/LS-BSR/ls_bsr.py -d ../KP_Zeng -i 0.8 -f T -p 40 -c cd-hit -b blastp -t T -e T LOG: 2021/07/02 17:37:31 - Testing paths of dependencies /home/anaconda3/envs/pgcgap/bin/blastp citation: Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, and Lipman DJ. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389-3402 /home/anaconda3/envs/pgcgap/bin/prodigal citation: Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, and Hauser LJ. 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11:119 /home/anaconda3/envs/pgcgap/bin/cd-hit citation: Li, W., Godzik, A. 2006. Cd-hit: a fast program for clustering and comparing large sets of protein or nuceltodie sequences. Bioinformatics 22(13):1658-1659 LOG: 2021/07/02 17:37:31 - predicting genes with Prodigal LOG: 2021/07/02 17:38:23 - Prodigal done LOG: 2021/07/02 17:38:23 - Converting genbank files LOG: 2021/07/02 17:40:11 - clustering with cd-hit at an ID of 0.8, length percentage of 0.9, using 40 processors Duplicate header IDs: 626_3 626_4 738_22 ... duplicate headers identified, renaming.. LOG: 2021/07/02 17:41:25 - starting blastp LOG: 2021/07/02 17:46:13 - BLAST done LOG: 2021/07/02 17:46:13 - Duplicate searching turned off LOG: 2021/07/02 17:46:15 - starting matrix building LOG: 2021/07/02 17:46:16 - The following genes had no hits in datasets or are too short, values changed to 0, check names and output:centroid_1530 centroid_2453 centroid_311 centroid_4813 centroid_5366 centroid_5638 centroid_6109 centroid_6695 LOG: 2021/07/02 17:46:16 - filtering duplicates Traceback (most recent call last): File "/home/biosoft/LS-BSR/ls_bsr.py", line 710, in options.filter_scaffolds,options.prefix,options.intergenics,options.min_len,options.dup_toggle) File "/home/biosoft/LS-BSR/ls_bsr.py", line 580, in main num_filtered = filter_paralogs("%s/bsr_matrix_values.txt" % start_dir, "duplicate_ids.txt") File "/home/biosoft/LS-BSR/ls_bsr/util.py", line 504, in filter_paralogs with open(ids) as genomes_file: FileNotFoundError: [Errno 2] No such file or directory: 'duplicate_ids.txt'

jasonsahl commented 3 years ago

Is it possible for you to share your genome files? I can't replicate the error using my test data.

jasonsahl commented 3 years ago

I actually did reproduce the error. The workaround is to either update the LS-BSR repository and try again, or rerun with the "-z T" flag. Please let me know if this doesn't fix your error.

yzhzu commented 3 years ago

It works for using -z T. thank you very much.