marbl / metAMOS

A metagenomic and isolate assembly and analysis pipeline built with AMOS
http://marbl.github.io/metAMOS
Other
93 stars 45 forks source link

Strange behaviour of preprocess module -- single reads file cannot be linked to lib%d.seq #166

Closed TimSkvortsov closed 9 years ago

TimSkvortsov commented 9 years ago

Hello

When I try to use MetAmos for processing of single .fastq files, the pipeline fails to link the file with single reads to lib%d.seq (EXAMPLE 1) and crashes subsequently, issuing a warning that the library in question has no reads Warning: library 1 has no sequences

The same single .fastq file is processed without any issues when all entries of ln in preprocess.py are substituted with ln -f (EXAMPLE 2)

I am wondering if it is a good way to solve the issue in question.

EXAMPLE 1

calculon@calculon-OptiPlex-9020:~$ initPipeline -q -1 /home/calculon/TEST/U.fastq -d /home/calculon/TEST/MET5 
Warning: Celera Assembler is not found, some functionality will not be available
Warning: BLASR is not found, some functionality will not be available
Warning: Newbler is not found, some functionality will not be available
Warning: MetaGeneMark is not found, some functionality will not be available
Warning: SignalP+ is not found, some functionality will not be available
Warning: PHmmer is not found, some functionality will not be available
Warning: FRCbam is not found, some functionality will not be available
Project dir /home/calculon/TEST/MET5 successfully created!
Use runPipeline.py to start Pipeline
calculon@calculon-OptiPlex-9020:~$ runPipeline -v -d /home/calculon/TEST/MET5 -a idba,metavelvet -o 50 -X lap,reapr,quast,ale,n50 -f Assemble,FindScaffoldORFS -n FindORFS,Abundance,Annotate,FunctionalAnnotation,MultiAlign,Propagate,Classify,FindRepeats 
[Steps to be skipped]:  set(['Abundance', 'MultiAlign', 'FindRepeats', 'FunctionalAnnotation', 'FindORFS', 'Annotate', 'Propagate', 'Classify'])
Starting Task = runpipeline.RUNPIPELINE
*** metAMOS running command: touch /home/calculon/TEST/MET5/Preprocess/out/lib1.seq

*** metAMOS running command: touch /home/calculon/TEST/MET5/Scaffold/out/proba.linearize.scaffolds.final

*** metAMOS running command: rm /home/calculon/TEST/MET5/Logs/assemble.ok

*** metAMOS running command: rm /home/calculon/TEST/MET5/Assemble/out/*.asm.contig

Starting metAMOS pipeline
Found pysam in /home/calculon/PROGRAMS/metAMOS_branch1.5rc3/Utilities/python/lib/python/pysam-0.6-py2.7-linux-x86_64.egg/pysam/__init__.pyc
Found psutil in /home/calculon/PROGRAMS/metAMOS_branch1.5rc3/Utilities/python/lib/python/psutil-0.6.1-py2.7-linux-x86_64.egg/psutil/__init__.pyc
Warning: Celera Assembler is not found, some functionality will not be available
Warning: BLASR is not found, some functionality will not be available
Warning: Newbler is not found, some functionality will not be available
Warning: MetaGeneMark is not found, some functionality will not be available
Warning: SignalP+ is not found, some functionality will not be available
Warning: PHmmer is not found, some functionality will not be available
Warning: FRCbam is not found, some functionality will not be available
[Available RAM: 29 GB]
[  There is *29 GB of RAM currently available on this machine, suggested minimum of 32 GB
[  *Enabling low MEM mode, might slow down some steps in pipeline
[  *If more RAM is available than what is listed above, please close down other programs and restart runPipeline
[Available CPUs: 8]
    *ok

________________________________________
Tasks which will be run:

Task = preprocess.Preprocess
Task = assemble.SplitAssemblers
Task = assemble.Assemble
Task = assemble.CheckAsmResults
Task = assemble.SplitMappers
Task = mapreads.MapReads
Task = mapreads.CheckMapResults
Task = mapreads.SplitForORFs
Task = findorfs.FindORFS
Task = validate.Validate
Task = findreps.FindRepeats
Task = annotate.Annotate
Task = fannotate.FunctionalAnnotation
Task = scaffold.Scaffold
Task = findscforfs.FindScaffoldORFS
Task = abundance.Abundance
Task = propagate.Propagate
Task = classify.Classify
Task = postprocess.Postprocess
________________________________________
Warning: Graphviz is not found, some functionality will not be available
metAMOS configuration summary:
metAMOS Version:    v1.5rc3 "Praline Brownie"  workflows: core,optional,imetamos
Time and Date:      2014-11-06
Working directory:  /home/calculon/TEST/MET5
Prefix:         proba
K-Mer:          31
Threads:        7
Taxonomic level:    class
Verbose:        True
Steps to skip:      Abundance, MultiAlign, FindRepeats, FunctionalAnnotation, FindORFS, Annotate, Propagate, Classify
Steps to force:     Assemble, FindScaffoldORFS

sh: 1: Syntax error: Bad fd number
Starting Task = preprocess.PREPROCESS
*** metAMOS running command: rm /home/calculon/TEST/MET5/Preprocess/out/all.seq.mates

*** metAMOS running command: ln /home/calculon/TEST/MET5/Preprocess/out/U.fastq /home/calculon/TEST/MET5/Preprocess/out/lib1.seq

*** metAMOS running command: touch /home/calculon/TEST/MET5/Preprocess/out/lib1.seq.mates

*** metAMOS running command: ln /home/calculon/TEST/MET5/Preprocess/out/lib1.seq /home/calculon/TEST/MET5/Preprocess/out/lib1.fastq

*** metAMOS running command: java -cp /home/calculon/PROGRAMS/metAMOS_branch1.5rc3/Utilities/java:. convertFastqToFasta /home/calculon/TEST/MET5/Preprocess/out/lib1.seq /home/calculon/TEST/MET5/Preprocess/out/lib1.fasta /home/calculon/TEST/MET5/Preprocess/out/lib1.fasta.qual

Warning: library 1 has no sequences

**ERROR**
All input sequences were empty

**ERROR**

Oops, MetAMOS finished with errors! see text in red above for details.
Traceback (most recent call last):
  File "/home/calculon/bin/./runPipeline", line 984, in <module>
    verbose = 1)
  File "/home/calculon/PROGRAMS/metAMOS_branch1.5rc3/Utilities/ruffus/task.py", line 2965, in pipeline_run
    raise job_errors
RethrownJobError: 

    Exception #1
      'ruffus.ruffus_exceptions.JobSignalledBreak(    

        )' raised in ...
       Task = def preprocess.Preprocess(...):
       Job  = [[U.fastq] -> preprocess.success]

    Traceback (most recent call last):
      File "/home/calculon/PROGRAMS/metAMOS_branch1.5rc3/Utilities/ruffus/task.py", line 625, in run_pooled_job_without_exceptions
        return_value =  job_wrapper(param, user_defined_work_func, register_cleanup, touch_files_only)
      File "/home/calculon/PROGRAMS/metAMOS_branch1.5rc3/Utilities/ruffus/task.py", line 491, in job_wrapper_io_files
        ret_val = user_defined_work_func(*param)
      File "/home/calculon/PROGRAMS/metAMOS_branch1.5rc3/src/preprocess.py", line 903, in Preprocess
        raise (JobSignalledBreak)
    JobSignalledBreak:     

calculon@calculon-OptiPlex-9020:~$ 

The same single .fastq file is processed without any issues if all entries of 'ln' in preprocess.py are substituted with 'ln -f':

EXAMPLE 2

calculon@calculon-OptiPlex-9020:~$ initPipeline -q -1 /home/calculon/TEST/U.fastq -d /home/calculon/TEST/MET6
Warning: Celera Assembler is not found, some functionality will not be available
Warning: BLASR is not found, some functionality will not be available
Warning: Newbler is not found, some functionality will not be available
Warning: MetaGeneMark is not found, some functionality will not be available
Warning: SignalP+ is not found, some functionality will not be available
Warning: PHmmer is not found, some functionality will not be available
Warning: FRCbam is not found, some functionality will not be available
Project dir /home/calculon/TEST/MET6 successfully created!
Use runPipeline.py to start Pipeline
calculon@calculon-OptiPlex-9020:~$ runPipeline -v -d /home/calculon/TEST/MET6 -a idba,metavelvet -o 50 -X lap,reapr,quast,ale,n50 -f Assemble,FindScaffoldORFS -n FindORFS,Abundance,Annotate,FunctionalAnnotation,MultiAlign,Propagate,Classify,FindRepeats 

sh: 1: Syntax error: Bad fd number
Starting Task = preprocess.PREPROCESS
*** metAMOS running command: rm /home/calculon/TEST/MET6/Preprocess/out/all.seq.mates

*** metAMOS running command: ln -f /home/calculon/TEST/MET6/Preprocess/out/U.fastq /home/calculon/TEST/MET6/Preprocess/out/lib1.seq

*** metAMOS running command: touch /home/calculon/TEST/MET6/Preprocess/out/lib1.seq.mates

*** metAMOS running command: ln -f /home/calculon/TEST/MET6/Preprocess/out/lib1.seq /home/calculon/TEST/MET6/Preprocess/out/lib1.fastq

*** metAMOS running command: java -cp /home/calculon/PROGRAMS/metAMOS_branch1.5rc3/Utilities/java:. convertFastqToFasta /home/calculon/TEST/MET6/Preprocess/out/lib1.seq /home/calculon/TEST/MET6/Preprocess/out/lib1.fasta /home/calculon/TEST/MET6/Preprocess/out/lib1.fasta.qual

*** metAMOS running command: touch /home/calculon/TEST/MET6/Preprocess/out/preprocess.success

    Job  = [[U.fastq] -> preprocess.success] completed
Completed Task = preprocess.Preprocess
Starting Task = assemble.ASSEMBLE
skoren commented 9 years ago

I think there are two issues in your output. First, Ubuntu links /bin/sh to dash which doesn't support some of the syntax metAMOS is using. If you change /bin/sh to point to bash, the bad fd descriptor error should go away. You can also change runPipeline line: os.system("%s/config/updateftpcounter.sh %s/%s.txt %s.txt >& ftp.out"%(utils.Settings.METAMOS_UTILS,utils.Settings.rundir,filestamp,filestamp)) to os.system("%s/config/updateftpcounter.sh %s/%s.txt %s.txt > ftp.out 2>&1"%(utils.Settings.METAMOS_UTILS,utils.Settings.rundir,filestamp,filestamp))

As far as the empty files, can you post the full preprocess.log file from the metAMOS TEST/Logs directory? Also, what happens if you manually run the ln command without the -f? Is any error reported by the system?

TimSkvortsov commented 9 years ago

Thank you for your suggestions, Sergey, I will try both of them. I am also going to install latest commit of MetAMOS 1.5rc3 to have an unmodified installation.

Full preprocess.log is very short for the test run I posted above:

rm: cannot remove ‘/home/calculon/TEST/MET5/Preprocess/out/all.seq.mates’: No such file or directory
ln: failed to create hard link ‘/home/calculon/TEST/MET5/Preprocess/out/lib1.seq’: File exists

As to ln command, I cannot run it manually without -f flag as a null-sized lib1.seq file already exists.

ln /home/calculon/TEST/MET5/Preprocess/out/U.fastq /home/calculon/TEST/MET5/Preprocess/out/lib1.seq
ln: failed to create hard link ‘/home/calculon/TEST/MET5/Preprocess/out/lib1.seq’: File exists

When lib1.seq is renamed or removed, ln /home/calculon/TEST/MET5/Preprocess/out/U.fastq /home/calculon/TEST/MET5/Preprocess/out/lib1.seq runs without errors.

skoren commented 9 years ago

Ah, OK I found the issue with this. The force of the assembly step is causing the issue where it creates an empty file. I'll commit a fix for this.

salaheenz commented 8 years ago

Hi Sergey, though I installed the new commit, I am getting similar error. Preprocess log indicates:

rm: cannot remove ‘/home/bhaley/metAMOS-1.5rc3/projectdirS2/Preprocess/out/all.seq.mates’: No such file or directory

bhaley@NextSeq-Server:~/metAMOS-1.5rc3$ sudo ./initPipeline -q -m /mnt/data/bhaley/Results/Serajus_interleaves/2.fastq.gz -d projectdirS2 -i 36:294 Warning: Celera Assembler is not found, some functionality will not be available Warning: BLASR is not found, some functionality will not be available Warning: Newbler is not found, some functionality will not be available Warning: MetaGeneMark is not found, some functionality will not be available Warning: SignalP+ is not found, some functionality will not be available Warning: PHmmer is not found, some functionality will not be available Warning: FRCbam is not found, some functionality will not be available Warning: MPI is not available, some functionality may not be available Project dir /home/bhaley/metAMOS-1.5rc3/projectdirS2 successfully created! Use runPipeline.py to start Pipeline bhaley@NextSeq-Server:~/metAMOS-1.5rc3$ sudo ./runPipeline -d projectdirS2 -f FunctionalAnnotation -n Scaffold

Starting metAMOS pipeline Warning: Celera Assembler is not found, some functionality will not be available Warning: BLASR is not found, some functionality will not be available Warning: Newbler is not found, some functionality will not be available Warning: MetaGeneMark is not found, some functionality will not be available Warning: SignalP+ is not found, some functionality will not be available Warning: PHmmer is not found, some functionality will not be available Warning: FRCbam is not found, some functionality will not be available Warning: MPI is not available, some functionality may not be available [Available RAM: 126 GB] ok [Available CPUs: 32] ok


Tasks which will be run:

Task = preprocess.Preprocess Task = assemble.SplitAssemblers Task = assemble.Assemble Task = assemble.CheckAsmResults Task = assemble.SplitMappers Task = mapreads.MapReads Task = mapreads.CheckMapResults Task = mapreads.SplitForORFs Task = findorfs.FindORFS Task = validate.Validate Task = findreps.FindRepeats Task = annotate.Annotate Task = fannotate.FunctionalAnnotation Task = scaffold.Scaffold Task = findscforfs.FindScaffoldORFS Task = abundance.Abundance Task = propagate.Propagate Task = classify.Classify Task = postprocess.Postprocess


Warning: Graphviz is not found, some functionality will not be available metAMOS configuration summary: metAMOS Version: v1.5rc3 "Praline Brownie" workflows: core,optional,imetamos Time and Date: 2016-03-16 Working directory: /home/bhaley/metAMOS-1.5rc3/projectdirS2 Prefix: proba K-Mer: 31 Threads: 31 Taxonomic level: class Verbose: False Steps to skip: MultiAlign, FindScaffoldORFS, Scaffold, Propagate, FindRepeats Steps to force: FunctionalAnnotation

[citation] MetAMOS Treangen, TJ ⇔ Koren, S, Sommer, DD, Liu, B, Astrovskaya, I, Ondov, B, Darling AE, Phillippy AM, Pop, M. MetAMOS: a modular and open source metagenomic assembly and analysis pipeline. Genome biology, 14(1), R2, 2013.

Step-specific configuration: [abundance] MetaPhyler /home/bhaley/metAMOS-1.5rc3/Utilities/cpp/Linux-x86_64 Liu B, Gibbons T, Ghodsi M, Treangen T, Pop M. Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences. BMC Genomics. 2011;12 Suppl 2:S4. Epub 2011 Jul 27.

[multialign] M-GCAT /home/bhaley/metAMOS-1.5rc3/Utilities/cpp/Linux-x86_64 Treangen TJ, Messeguer X. M-GCAT: interactively and efficiently constructing large-scale multiple genome comparison frameworks in closely related species. BMC Bioinformatics, 2006.

[fannotate] BLAST /home/bhaley/metAMOS-1.5rc3/Utilities/cpp/Linux-x86_64 Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403-10.

[scaffold] Bambus 2 /home/bhaley/metAMOS-1.5rc3/AMOS/Linux-x86_64/bin Koren S, Treangen TJ, Pop M. Bambus 2: scaffolding metagenomes. Bioinformatics 27(21): 2964-2971 2011.

[findorfs] FragGeneScan /home/bhaley/metAMOS-1.5rc3/Utilities/cpp/Linux-x86_64 Rho M, Tang H, Ye Y: FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Research 2010, 38:e191-e191.

[annotate] Kraken /home/bhaley/metAMOS-1.5rc3/Utilities/cpp/Linux-x86_64/kraken/bin Wood DE, Salzberg SL: Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biology 2014, 15:R46.

[assemble] SOAPdenovo /home/bhaley/metAMOS-1.5rc3/Utilities/cpp/Linux-x86_64 Li Y, Hu Y, Bolund L, Wang J: State of the art de novo assembly of human genomes from massively parallel sequencing data.Human genomics 2010, 4:271-277.

[mapreads] Bowtie /home/bhaley/metAMOS-1.5rc3/Utilities/cpp/Linux-x86_64 Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25. Epub 2009 Mar 4.

[preprocess] metAMOS built-in filtering N/A

[validate] LAP /home/bhaley/metAMOS-1.5rc3/LAP Ghodsi M, Hill CM, Astrovskaya I, Lin H, Sommer DD, Koren S, Pop M. De novo likelihood-based measures for comparing genome assemblies. BMC research notes 6:334, 2013.

[other] Krona /home/bhaley/metAMOS-1.5rc3/KronaTools/bin Ondov BD, Bergman NH, Phillippy AM. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics. 2011 Sep 30;12:385.

sh: 1: Syntax error: Bad fd number Starting Task = preprocess.PREPROCESS Warning: library 1 has no sequences

ERROR All input sequences were empty

ERROR

Oops, MetAMOS finished with errors! see text in red above for details.

Jackie789 commented 7 years ago

I am also having this issue on a Ubuntu machine as described above. Has anyone found a work-around for this, yet?

skoren commented 7 years ago

Please don't hijack closed issues, file a new one if you encounter an error.

Have you tried updating the link for bin/sh to point to bash not dash or updating the pipeline.