nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
300 stars 82 forks source link

FileNotFoundError: [Errno 2] No such file or directory: 'rna-seq.nanopore.fastq.gz #978

Open arayyyy310 opened 7 months ago

arayyyy310 commented 7 months ago
#########################################################
Running `funannotate clean` unit testing: minimap2 mediated assembly duplications
Downloading: https://osf.io/8pjbe/download?version=1 Bytes: 252076                                                                    -----------------------------------------------
6 input contigs, 6 larger than 500 bp, N50 is 427,039 bp
Checking duplication of 6 contigs
-----------------------------------------------
minimap2 version=2.26-r1175 path=/venv/bin/minimap2
scaffold_73 appears duplicated: 100% identity over 100% of the contig. contig length: 15153
scaffold_91 appears duplicated: 100% identity over 100% of the contig. contig length: 8858
scaffold_27 appears duplicated: 100% identity over 100% of the contig. contig length: 427039
-----------------------------------------------
6 input contigs; 6 larger than 500 bp; 3 duplicated; 3 written to file
CMD: funannotate clean -i test.clean.fa -o test.exhaustive.fa --exhaustive
#########################################################
#########################################################
SUCCESS: `funannotate clean` test complete.
#########################################################

#########################################################
Running `funannotate mask` unit testing: RepeatModeler --> RepeatMasker
Downloading: https://osf.io/hbryz/download?version=1 Bytes: 375687                                                     [Nov 07 08:37 AM]: OS: Debian GNU/Linux 10, 6 cores, ~ 32 GB RAM. Python: 3.8.12
[Nov 07 08:37 AM]: Running funanotate v1.8.16
[Nov 07 08:37 AM]: Soft-masking simple repeats with tantan
[Nov 07 08:37 AM]: Repeat soft-masking finished: 
Masked genome: /test-mask_848cc994-3146-4d37-aebe-20850604f61e/test.masked.fa
num scaffolds: 2
assembly size: 1,216,048 bp
masked repeats: 50,965 bp (4.19%)
-------------------------------------------------------
-------------------------------------------------------
CMD: funannotate mask -i test.fa -o test.masked.fa --cpus 12
#########################################################
#########################################################
SUCCESS: `funannotate mask` test complete.
#########################################################

#########################################################
Running `funannotate predict` unit testing                                                                                                          -------------------------------------------------------]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]][Nov 07 08:38 AM]: OS: Debian GNU/Linux 10, 6 cores, ~ 32 GB RAM. Python: 3.8.12
[Nov 07 08:38 AM]: Running funannotate v1.8.16
[Nov 07 08:38 AM]: Skipping CodingQuarry as no --rna_bam passed
[Nov 07 08:38 AM]: Parsed training data, run ab-initio gene predictors as follows:
  Program      Training-Method
  augustus     pretrained     
  genemark     selftraining   
  glimmerhmm   busco          
  snap         busco          
[Nov 07 08:38 AM]: Loading genome assembly and parsing soft-masked repetitive sequences
[Nov 07 08:38 AM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked
/venv/lib/python3.8/site-packages/funannotate/aux_scripts/funannotate-p2g.py:14: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  from pkg_resources import parse_version
[Nov 07 08:38 AM]: Mapping 1,065 proteins to genome using diamond and exonerate
[Nov 07 08:38 AM]: Found 1,505 preliminary alignments with diamond in 0:00:04 --> generated FASTA files for exonerate in 0:00:00
[Nov 07 08:39 AM]: Exonerate finished in 0:00:36: found 1,270 alignments
     Progress: 1505 complete, 0 failed, 0 remaining          
[Nov 07 08:39 AM]: Running GeneMark-ES on assembly
[Nov 07 08:42 AM]: 1,554 predictions from GeneMark
[Nov 07 08:42 AM]: Running BUSCO to find conserved gene models for training ab-initio predictors
[Nov 07 08:54 AM]: 373 valid BUSCO predictions found, validating protein sequences
[Nov 07 08:55 AM]: 370 BUSCO predictions validated
[Nov 07 08:55 AM]: Running Augustus gene prediction using saccharomyces parameters
[Nov 07 08:58 AM]: 1,485 predictions from Augustus       
     Progress: 11 complete, 0 failed, 0 remaining        
[Nov 07 08:58 AM]: Pulling out high quality Augustus predictions
[Nov 07 08:58 AM]: Found 371 high quality predictions from Augustus (>90% exon evidence)
[Nov 07 08:58 AM]: Running SNAP gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Nov 07 08:58 AM]: 1,511 predictions from SNAP
[Nov 07 08:58 AM]: Running GlimmerHMM gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Nov 07 09:00 AM]: 1,766 predictions from GlimmerHMM
[Nov 07 09:00 AM]: Summary of gene models passed to EVM (weights):
[Nov 07 09:00 AM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval
[Nov 07 09:07 AM]: Converting to GFF3 and collecting all EVM results
     Progress: 45 complete, 0 failed, 0 remaining        
  Source         Weight   Count
  Augustus       1        1325 
  Augustus HiQ   2        372  
  GeneMark       1        1554 
  GlimmerHMM     1        1766 
  snap           1        1511 
  Total          -        6528 
[Nov 07 09:07 AM]: 1,712 total gene models from EVM
[Nov 07 09:07 AM]: Generating protein fasta files from 1,712 EVM models
[Nov 07 09:07 AM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc).
[Nov 07 09:07 AM]: Found 112 gene models to remove: 0 too short; 0 span gaps; 112 transposable elements
[Nov 07 09:07 AM]: 1,600 gene models remaining
[Nov 07 09:07 AM]: Predicting tRNAs
[Nov 07 09:07 AM]: 112 tRNAscan models are valid (non-overlapping)
[Nov 07 09:07 AM]: Generating GenBank tbl annotation file
[Nov 07 09:07 AM]: Collecting final annotation files for 1,712 total gene models
[Nov 07 09:07 AM]: Converting to final Genbank format
[Nov 07 09:07 AM]: Funannotate predict is finished, output files are in the annotate/predict_results folder
[Nov 07 09:07 AM]: Your next step might be functional annotation, suggested commands:
-------------------------------------------------------
Run InterProScan (manual install): 
funannotate iprscan -i annotate -c 12

Run antiSMASH (optional): 
funannotate remote -i annotate -m antismash -e youremail@server.edu

Annotate Genome: 
funannotate annotate -i annotate --cpus 12 --sbt yourSBTfile.txt
-------------------------------------------------------

[Nov 07 09:07 AM]: Training parameters file saved: annotate/predict_results/saccharomyces.parameters.json
[Nov 07 09:07 AM]: Add species parameters to database:

  funannotate species -s saccharomyces -a annotate/predict_results/saccharomyces.parameters.json

-------------------------------------------------------
[Nov 07 09:07 AM]: OS: Debian GNU/Linux 10, 6 cores, ~ 32 GB RAM. Python: 3.8.12
[Nov 07 09:07 AM]: Running funannotate v1.8.16
[Nov 07 09:07 AM]: Skipping CodingQuarry as no --rna_bam passed
[Nov 07 09:07 AM]: Parsed training data, run ab-initio gene predictors as follows:
  Program      Training-Method
  augustus     busco          
  genemark     selftraining   
  glimmerhmm   busco          
  snap         busco          
[Nov 07 09:08 AM]: Loading genome assembly and parsing soft-masked repetitive sequences
[Nov 07 09:08 AM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked
/venv/lib/python3.8/site-packages/funannotate/aux_scripts/funannotate-p2g.py:14: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  from pkg_resources import parse_version
[Nov 07 09:08 AM]: Mapping 1,065 proteins to genome using diamond and exonerate
[Nov 07 09:08 AM]: Found 1,505 preliminary alignments with diamond in 0:00:03 --> generated FASTA files for exonerate in 0:00:00
[Nov 07 09:08 AM]: Exonerate finished in 0:00:35: found 1,270 alignments
     Progress: 1505 complete, 0 failed, 0 remaining          
[Nov 07 09:08 AM]: Running GeneMark-ES on assembly
[Nov 07 09:11 AM]: 1,557 predictions from GeneMark
[Nov 07 09:11 AM]: Running BUSCO to find conserved gene models for training ab-initio predictors
[Nov 07 09:24 AM]: 373 valid BUSCO predictions found, validating protein sequences
[Nov 07 09:24 AM]: 370 BUSCO predictions validated
[Nov 07 09:24 AM]: Training Augustus using BUSCO gene models
[Nov 07 09:25 AM]: Augustus initial training results:
  Feature       Specificity   Sensitivity
  nucleotides   99.5%         83.8%      
  exons         71.8%         59.7%      
  genes         86.5%         59.3%      
[Nov 07 09:25 AM]: Running Augustus gene prediction using awesome_busco parameters
[Nov 07 09:26 AM]: 1,301 predictions from Augustus
     Progress: 11 complete, 0 failed, 0 remaining        
[Nov 07 09:26 AM]: Pulling out high quality Augustus predictions
[Nov 07 09:26 AM]: Found 313 high quality predictions from Augustus (>90% exon evidence)
[Nov 07 09:26 AM]: Running SNAP gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Nov 07 09:27 AM]: 1,490 predictions from SNAP
[Nov 07 09:27 AM]: Running GlimmerHMM gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Nov 07 09:28 AM]: 1,771 predictions from GlimmerHMM
[Nov 07 09:28 AM]: Summary of gene models passed to EVM (weights):
[Nov 07 09:28 AM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval
[Nov 07 09:38 AM]: Converting to GFF3 and collecting all EVM results
     Progress: 43 complete, 0 failed, 0 remaining        
  Source         Weight   Count
  Augustus       1        988  
  Augustus HiQ   2        313  
  GeneMark       1        1557 
  GlimmerHMM     1        1771 
  snap           1        1490 
  Total          -        6119 
[Nov 07 09:38 AM]: 1,696 total gene models from EVM
[Nov 07 09:38 AM]: Generating protein fasta files from 1,696 EVM models
[Nov 07 09:38 AM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc).
[Nov 07 09:38 AM]: Found 104 gene models to remove: 0 too short; 0 span gaps; 104 transposable elements
[Nov 07 09:38 AM]: 1,592 gene models remaining
[Nov 07 09:38 AM]: Predicting tRNAs
[Nov 07 09:39 AM]: 112 tRNAscan models are valid (non-overlapping)
[Nov 07 09:39 AM]: Generating GenBank tbl annotation file
[Nov 07 09:39 AM]: Collecting final annotation files for 1,704 total gene models
[Nov 07 09:39 AM]: Converting to final Genbank format
[Nov 07 09:39 AM]: Funannotate predict is finished, output files are in the annotate/predict_results folder
[Nov 07 09:39 AM]: Your next step might be functional annotation, suggested commands:
-------------------------------------------------------
Run InterProScan (manual install): 
funannotate iprscan -i annotate -c 12

Run antiSMASH (optional): 
funannotate remote -i annotate -m antismash -e youremail@server.edu

Annotate Genome: 
funannotate annotate -i annotate --cpus 12 --sbt yourSBTfile.txt
-------------------------------------------------------

[Nov 07 09:39 AM]: Training parameters file saved: annotate/predict_results/awesome_busco.parameters.json
[Nov 07 09:39 AM]: Add species parameters to database:

  funannotate species -s awesome_busco -a annotate/predict_results/awesome_busco.parameters.json

-------------------------------------------------------
[Nov 07 09:39 AM]: OS: Debian GNU/Linux 10, 6 cores, ~ 32 GB RAM. Python: 3.8.12
[Nov 07 09:39 AM]: Running funannotate v1.8.16
[Nov 07 09:39 AM]: Ab initio training parameters file passed: annotate/predict_results/awesome_busco.parameters.json
[Nov 07 09:39 AM]: Skipping CodingQuarry as no --rna_bam passed
[Nov 07 09:39 AM]: Parsed training data, run ab-initio gene predictors as follows:
  Program      Training-Method
  augustus     pretrained     
  genemark     pretrained     
  glimmerhmm   pretrained     
  snap         pretrained     
[Nov 07 09:39 AM]: Loading genome assembly and parsing soft-masked repetitive sequences
[Nov 07 09:39 AM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked
/venv/lib/python3.8/site-packages/funannotate/aux_scripts/funannotate-p2g.py:14: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  from pkg_resources import parse_version
[Nov 07 09:39 AM]: Mapping 1,065 proteins to genome using diamond and exonerate
[Nov 07 09:39 AM]: Found 1,505 preliminary alignments with diamond in 0:00:05 --> generated FASTA files for exonerate in 0:00:00
[Nov 07 09:40 AM]: Exonerate finished in 0:00:46: found 1,270 alignments
     Progress: 1505 complete, 0 failed, 0 remaining          
[Nov 07 09:40 AM]: Running GeneMark-ES on assembly
[Nov 07 09:44 AM]: 1,562 predictions from GeneMark
[Nov 07 09:44 AM]: Running Augustus gene prediction using awesome_busco parameters
[Nov 07 09:46 AM]: 1,301 predictions from Augustus
     Progress: 11 complete, 0 failed, 0 remaining        
[Nov 07 09:46 AM]: Pulling out high quality Augustus predictions
[Nov 07 09:46 AM]: Found 313 high quality predictions from Augustus (>90% exon evidence)
[Nov 07 09:46 AM]: Running SNAP gene prediction, using pre-trained HMM profile
[Nov 07 09:46 AM]: 1,490 predictions from SNAP
[Nov 07 09:46 AM]: Running GlimmerHMM gene prediction, using pretrained HMM profile
[Nov 07 09:47 AM]: 1,771 predictions from GlimmerHMM
[Nov 07 09:47 AM]: Summary of gene models passed to EVM (weights):
[Nov 07 09:47 AM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval
[Nov 07 09:58 AM]: Converting to GFF3 and collecting all EVM results
     Progress: 43 complete, 0 failed, 0 remaining        
  Source         Weight   Count
  Augustus       1        988  
  Augustus HiQ   2        313  
  GeneMark       1        1562 
  GlimmerHMM     1        1771 
  snap           1        1490 
  Total          -        6124 
[Nov 07 09:58 AM]: 1,700 total gene models from EVM
[Nov 07 09:58 AM]: Generating protein fasta files from 1,700 EVM models
[Nov 07 09:58 AM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc).
[Nov 07 09:58 AM]: Found 105 gene models to remove: 0 too short; 0 span gaps; 105 transposable elements
[Nov 07 09:58 AM]: 1,595 gene models remaining
[Nov 07 09:58 AM]: Predicting tRNAs
[Nov 07 09:59 AM]: 112 tRNAscan models are valid (non-overlapping)
[Nov 07 09:59 AM]: Generating GenBank tbl annotation file
[Nov 07 09:59 AM]: Collecting final annotation files for 1,707 total gene models
[Nov 07 09:59 AM]: Converting to final Genbank format
[Nov 07 09:59 AM]: Funannotate predict is finished, output files are in the annotate2/predict_results folder
[Nov 07 09:59 AM]: Your next step might be functional annotation, suggested commands:
-------------------------------------------------------
Run InterProScan (manual install): 
funannotate iprscan -i annotate2 -c 12

Run antiSMASH (optional): 
funannotate remote -i annotate2 -m antismash -e youremail@server.edu

Annotate Genome: 
funannotate annotate -i annotate2 --cpus 12 --sbt yourSBTfile.txt
-------------------------------------------------------

[Nov 07 09:59 AM]: Training parameters file saved: annotate2/predict_results/awesome_busco.parameters.json
[Nov 07 09:59 AM]: Add species parameters to database:

  funannotate species -s awesome_busco -a annotate2/predict_results/awesome_busco.parameters.json

CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --augustus_species saccharomyces --cpus 12 --species Awesome testicus
#########################################################
#########################################################
SUCCESS: `funannotate predict` test complete.
#########################################################

#########################################################
Running `funannotate predict` BUSCO-mediated training unit testing
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --cpus 12 --species Awesome busco
#########################################################
#########################################################
SUCCESS: `funannotate predict` BUSCO-mediated training test complete.
#########################################################
Now running predict using all pre-trained ab-initio predictors
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate2 --cpus 12 --species Awesome busco -p annotate/predict_results/awesome_busco.parameters.json
#########################################################
#########################################################
SUCCESS: `funannotate predict` using existing parameters test complete.
#########################################################
                                                                                                                                          4276384  [0.79%]]]]]]]]]]]]gzip: stdin: unexpected end of file]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]tar: Unexpected EOF in archive]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]tar: Unexpected EOF in archive]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]tar: Error is not recoverable: exiting now
Traceback (most recent call last):
  File "/venv/bin/funannotate", line 8, in <module>
    sys.exit(main())
  File "/venv/lib/python3.8/site-packages/funannotate/funannotate.py", line 717, in main
    mod.main(arguments)
  File "/venv/lib/python3.8/site-packages/funannotate/test.py", line 409, in main
    runRNAseqTest(args)
  File "/venv/lib/python3.8/site-packages/funannotate/test.py", line 332, in runRNAseqTest
    shutil.copyfile(f, os.path.join(tmpdir, f))
  File "/venv/lib/python3.8/shutil.py", line 264, in copyfile
    with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst:
FileNotFoundError: [Errno 2] No such file or directory: 'rna-seq.nanopore.fastq.gz'
nextgenusfs commented 7 months ago

looks like the download portion of the RNA-seq test failed. you can try to isolate the problem with funannotate test -t rna-seq

nextgenusfs commented 7 months ago

If the download fails, you can try to get it manually before running the script like this:

$ wget -O test-rna_seq.tar.gz https://osf.io/t7j83/download?version=1