lehtiolab / proteogenomics-analysis-workflow

IPAW: a Nextflow workflow for proteogenomics
24 stars 9 forks source link

Error when loading mzML #2

Closed Weronika77 closed 6 years ago

Weronika77 commented 6 years ago

Hi,

So I followed the exact instructions in the README file to install it. However when I run this workflow, I get the following error:

./nextflow run ipaw.nf --tdb VarDB.fasta   
--mzmls \*.mzML   --gtf VarDB.gtf   
--knownproteins Homo_sapiens.GRCh38.pep.all.fa.gz   
--blastdb UniProteome+Ensembl87+refseq+GENCODE24.proteins.fasta   
--snpfa MSCanProVar_ensemblV79.filtered.fasta   
--genome hg38.chr1-22.X.Y.M.fa.masked   
--cosmic CosmicMutantExport.tsv   
--outdir results
N E X T F L O W  ~  version 0.28.0
Launching `ipaw.nf` [kickass_solvay] - revision: 86a6af3688
WARN: Access to undefined parameter `dbsnp` -- Initialise it to a default value eg. `params.dbsnp = some_value`
WARN: Access to undefined parameter `ddb` -- Initialise it to a default value eg. `params.ddb = some_value`
[warm up] executor > local
ERROR ~ For input string: "170511HFc1_LZ-1444-TCAM2-1-20X_0062"

 -- Check script 'ipaw.nf' at line: 82 or see '.nextflow.log' file for more details

So what I did next was the comment out line 82 in the ipaw.nf: .map { it -> [it.baseName.replaceFirst(/.*fr(\d\d).*/, "\$1").toInteger(), it.baseName.replaceFirst(/.*\/(\S+)\.mzML/, "\$1"), it] }

And it did bring me further, however I still get an error.

./nextflow run ipaw.nf --tdb VarDB.fasta   
--mzmls \*.mzML   --gtf VarDB.gtf   
--knownproteins Homo_sapiens.GRCh38.pep.all.fa.gz   
--blastdb UniProteome+Ensembl87+refseq+GENCODE24.proteins.fasta   
--snpfa MSCanProVar_ensemblV79.filtered.fasta   
--genome hg38.chr1-22.X.Y.M.fa.masked   
--cosmic CosmicMutantExport.tsv   
--outdir results
N E X T F L O W  ~  version 0.28.0
Launching `ipaw.nf` [astonishing_panini] - revision: ed1e883ccc
WARN: Access to undefined parameter `dbsnp` -- Initialise it to a default value eg. `params.dbsnp = some_value`
WARN: Access to undefined parameter `ddb` -- Initialise it to a default value eg. `params.ddb = some_value`
[warm up] executor > local
WARN: Input tuple does not match input set cardinality declared by process `IsobaricQuant` -- offending value: /path/to/mzMLfile/nameoffile.mzML
ERROR ~ Error executing process > 'createSpectraLookup'

Caused by:
  Process `createSpectraLookup` input file name collision -- There are multiple input files for each of the following file names: proteogenomics-analysis-workflow

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details
[54/387e5c] Submitted process > makeProtSeq
WARN: Killing pending tasks (1)

Can you please help me with this error?

yafeng commented 6 years ago

Are you running with the latest ipaw? can you try git pull first? Some updates were made. Not sure if it will fix your problem, but we can look into it if it still doesn't work.

BTW, you should provide hg19 genome instead of hg38, although the pipeline still works, but you will get two versions of coordinates because the other process in the pipeline map the peptides to hg19 genome.

glormph commented 6 years ago

Hi, the pipeline is currently a bit tailored towards our lab's sample fractionation system and therefore expects filenames to have a fraction nr like this: samplename_or_something_fr01.mzML . This piece of code extracts the fraction name from the file: it.baseName.replaceFirst(/.*fr(\d\d).*/, "\$1").toInteger()

I can have a look later to see if that part can be removed that since I don't think we actually use the fractionation in this pipe (it's just that it is so standard in the lab that it got built in).

EDIT: as it turns out, the fraction-parsing was removed in this commit , so @yafeng was right, do a git pull to get the latest version, and then at least you will not get that particular error.

Weronika77 commented 6 years ago

Great, the git pull worked! Thanks for the quick reply!

However, now I ran into a new error...

./nextflow run ipaw.nf --tdb VarDB.fasta   --mzmls *.mzML    --gtf VarDB.gtf   --knownproteins Homo_sapiens.GRCh38.pep.all.fa.gz   --blastdb UniProteome+Ensembl87+refseq+GENCODE24.proteins.fasta   --snpfa MSCanProVar_ensemblV79.filtered.fasta   --genome hg19.chr1-22.X.Y.M.fa.masked    --cosmic CosmicMutantExport.tsv   --outdir results
N E X T F L O W  ~  version 0.28.0
Launching `ipaw.nf` [compassionate_fourier] - revision: 2f897ac094
WARN: Access to undefined parameter `dbsnp` -- Initialise it to a default value eg. `params.dbsnp = some_value`
WARN: Access to undefined parameter `mzmldef` -- Initialise it to a default value eg. `params.mzmldef = some_value`
Detected setnames: NA
[warm up] executor > local
[ac/658a74] Submitted process > concatFasta
[76/3ff5ac] Submitted process > makeTrypSeq
[3b/db0da8] Submitted process > makeProtSeq
[fc/abc54a] Submitted process > createSpectraLookup (1)
ERROR ~ Error executing process > 'concatFasta'

Caused by:
  Process `concatFasta` terminated with an error exit status (127)

Command executed:

  cat VarDB.fasta Homo_sapiens.GRCh38.pep.all.fa.gz > db.fa

Command exit status:
  127

Command output:
  (empty)

Command error:
  /bin/bash: error while loading shared libraries: libtinfo.so.5: cannot open shared object file: Permission denied

Work dir:
  /home/weronika/proteogenomics-analysis-workflow/work/ac/658a7450cc52d4d65e7282ef78c28e

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details
WARN: Killing pending tasks (3)

and when I looked into .comand.run I noticed when I commented out docker run -i -v /home/weronika/proteogenomics-analysis-workflow:/home/weronika/proteogenomics-analysis-workflow -v "$PWD":"$PWD" -w "$PWD" --entrypoint /bin/bash -u $(id -u):$(id -g)$ It wouldn't give me this error anymore, so I think there might be something wrong with the docker?

Also I was wondering, since the pipeline maps to hg19, is there also a version coming soon where it will map to hg38?

glormph commented 6 years ago

Interesting, we haven't seen that error here. I found a similar error message here. Our underlying system for running docker has been Ubuntu, but if yours is RH based maybe the linked bug report is correct that SELinux is preventing Docker's access to the library. I am not sure we can help there (and the linked bug report is filed under CANTFIX which is also not good news), maybe you have to talk with your system administrator.

Weronika77 commented 6 years ago
cat /etc/issue
Ubuntu 16.04.4 LTS \n \l

As it shows I have got Ubuntu, so RH shouldn't be the problem.. I already talked with my administrator and he showed me that it might have something to do with the docker by commenting that line out, as said above.

Anyway, thanks for you help

glormph commented 6 years ago

I'm by no means a docker or nextflow expert, but it looks a bit like the cat command inside docker isn't allowed by your system. Indeed commenting out docker should not remove that error.

We run Ubuntu 16.04 as well, docker version 17.12.1-ce, build 7390fc6 and the user who runs the nextflow command is member of the docker group. If you run the pipeline with sudo (which shouldnt be necessary if the regular user can start docker containers), does the error disappear?

Weronika77 commented 6 years ago

So I went into the work dir folder (/home/weronika/proteogenomics-analysis-workflow/work/ac/658a7450cc52d4d65e7282ef78c28e) from the previous error and I put sudo in front of the docker command in the .command.run file. Then I ran it again and it would say the same thing about permission denied but then about this work dir: /home/weronika/proteogenomics-analysis-workflow/work/f4/c0a5eff5105afe28da036b9ea1fda0. Then I put again sudo in front of the docker command in .command.run. Then I ran it again and got the same error but again a different folder. However there are 139 folder in work, so I gave up after doing this for the 3rd time since I'm not manually gonna change those 139 .command.run files. So then I tried to put sudo in front of ./netflow and it gave me the following:

sudo ./nextflow run ipaw.nf --tdb VarDB.fasta   --mzmls *.mzML    --gtf VarDB.gtf   --knownproteins Homo_sapiens.GRCh38.pep.all.fa.gz   --blastdb UniProteome+Ensembl87+refseq+GENCODE24.proteins.fasta   --snpfa MSCanProVar_ensemblV79.filtered.fasta   --genome hg19.chr1-22.X.Y.M.fa.masked    --cosmic CosmicMutantExport.tsv   --outdir results
[sudo] password for weronika: 
N E X T F L O W  ~  version 0.28.0
Launching `ipaw.nf` [dreamy_hilbert] - revision: 2f897ac094
WARN: Access to undefined parameter `dbsnp` -- Initialise it to a default value eg. `params.dbsnp = some_value`
WARN: Access to undefined parameter `mzmldef` -- Initialise it to a default value eg. `params.mzmldef = some_value`
Detected setnames: NA
[warm up] executor > local
[84/963217] Submitted process > makeProtSeq
[73/937bdd] Submitted process > concatFasta
[c4/f060b8] Submitted process > makeTrypSeq
[e5/fffd82] Submitted process > createSpectraLookup (1)
ERROR ~ Error executing process > 'makeProtSeq'

Caused by:
  Process `makeProtSeq` terminated with an error exit status (1)

Command executed:

  msslookup protspace -i Homo_sapiens.GRCh38.pep.all.fa.gz --minlen 8

Command exit status:
  1

Command output:
  (empty)

Command error:
  ps: bad -o argument 'state', supported arguments: user,group,comm,args,pid,ppid,pgid,tty,vsz,stat,rss
  ps: bad -o argument 'state', supported arguments: user,group,comm,args,pid,ppid,pgid,tty,vsz,stat,rss
  Traceback (most recent call last):
    File "/usr/local/bin/msslookup", line 6, in <module>
      sys.exit(app.mslookup.main())
    File "/usr/local/lib/python3.6/site-packages/app/mslookup.py", line 21, in main
      startup.start_msstitch(drivers, sys.argv)
    File "/usr/local/lib/python3.6/site-packages/app/drivers/startup.py", line 53, in start_msstitch
      args.func(**vars(args))
    File "/usr/local/lib/python3.6/site-packages/app/drivers/base.py", line 74, in start
      self.run()
    File "/usr/local/lib/python3.6/site-packages/app/drivers/mslookup/base.py", line 35, in run
      self.create_lookup()
    File "/usr/local/lib/python3.6/site-packages/app/drivers/mslookup/seqspace.py", line 54, in create_lookup
      self.minlength)
    File "/usr/local/lib/python3.6/site-packages/app/actions/mslookup/searchspace.py", line 7, in create_searchspace_wholeproteins
      prots = {str(prot.seq).replace('L', 'I'): prot.id for prot in fasta}
    File "/usr/local/lib/python3.6/site-packages/app/actions/mslookup/searchspace.py", line 7, in <dictcomp>
      prots = {str(prot.seq).replace('L', 'I'): prot.id for prot in fasta}
    File "/usr/local/lib/python3.6/site-packages/Bio/SeqIO/__init__.py", line 609, in parse
      for r in i:
    File "/usr/local/lib/python3.6/site-packages/Bio/SeqIO/FastaIO.py", line 122, in FastaIterator
      for title, sequence in SimpleFastaParser(handle):
    File "/usr/local/lib/python3.6/site-packages/Bio/SeqIO/FastaIO.py", line 43, in SimpleFastaParser
      line = handle.readline()
    File "/usr/local/lib/python3.6/encodings/ascii.py", line 26, in decode
      return codecs.ascii_decode(input, self.errors)[0]
  UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 1: ordinal not in range(128)

Work dir:
  /home/weronika/proteogenomics-analysis-workflow/work/84/963217e693626b08636f0d011f4de0

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

 -- Check '.nextflow.log' file for details
WARN: Killing pending tasks (3)

So unfortunately sudo doesn't solve my problems.

glormph commented 6 years ago

Hi, I can't be 100% sure but from the filename it looks like the --knownproteins Homo_sapiens.GRCh38.pep.all.fa.gz file needs to be unzipped.

Also sudo ./nextflow ... as you did works fine, you dont have to do it in each workdir. It results in that all the docker-launching commands are run as sudo. It may also work if you add sudo: True to the config file as we used to have (it was removed here). Running everything as sudo will make your work directories also have root ownership though, which can be problematic if you later run something without sudo which cannot access them, but I guess a working pipeline is more important right now :).

Weronika77 commented 6 years ago

I have now used the unzipped version for --knownproteins. And it did get me further! Although you should probably change that in your README file, though. Unfortunately, I ran into a new error.

sudo ./nextflow run ipaw.nf --tdb VarDB.fasta   --mzmls *.mzML    --gtf VarDB.gtf   --knownproteins Homo_sapiens.GRCh38.pep.all.fa   --blastdb UniProteome+Ensembl87+refseq+GENCODE24.proteins.fasta   --snpfa MSCanProVar_ensemblV79.filtered.fasta   --genome hg19.chr1-22.X.Y.M.fa.masked    --cosmic CosmicMutantExport.tsv   --outdir results
[sudo] password for weronika: 
N E X T F L O W  ~  version 0.28.0
Launching `ipaw.nf` [sharp_goldstine] - revision: 2f897ac094
WARN: Access to undefined parameter `dbsnp` -- Initialise it to a default value eg. `params.dbsnp = some_value`
WARN: Access to undefined parameter `mzmldef` -- Initialise it to a default value eg. `params.mzmldef = some_value`
Detected setnames: NA
[warm up] executor > local
[64/dd0905] Submitted process > makeTrypSeq
[7b/206d15] Submitted process > makeProtSeq
[78/d5b6b2] Submitted process > concatFasta
[da/24b225] Submitted process > createSpectraLookup (1)
[bb/d53822] Submitted process > makeDecoyReverseDB
[af/af784a] Submitted process > msgfPlus (1)
ERROR ~ Error executing process > 'msgfPlus (1)'

Caused by:
  Missing output file(s) `TCAM2.mzid` expected by process `msgfPlus (1)`

Command executed:

  msgf_plus -Xmx16G -d concatdb.fasta -s TCAM2.mzML -o "TCAM2.mzid" -thread 12 -mod Mods.txt -tda 0 -t 10.0ppm -ti -1,2 -m 0 -inst 3 -e 1 -protocol null -ntt 2 -minLength 7 -maxLength 50 -minCharge 2 -maxCharge 6 -n 1 -addFeatures 1
  msgf_plus -Xmx3500M edu.ucsd.msjava.ui.MzIDToTsv -i "TCAM2.mzid" -o out.mzid.tsv

Command exit status:
  0

Command output:

  MS-GF+ Release (v2016.10.26) (26 Oct 2016)
  Usage: java -Xmx3500M -jar MSGFPlus.jar
    -s SpectrumFile (*.mzML, *.mzXML, *.mgf, *.ms2, *.pkl or *_dta.txt)
    -d DatabaseFile (*.fasta or *.fa)
    [-o OutputFile (*.mzid)] (Default: [SpectrumFileName].mzid)
    [-t PrecursorMassTolerance] (e.g. 2.5Da, 20ppm or 0.5Da,2.5Da, Default: 20ppm)
       Use comma to set asymmetric values. E.g. "-t 0.5Da,2.5Da" will set 0.5Da to the minus (expMass<theoMass) and 2.5Da to plus (expMass>theoMass)
    [-ti IsotopeErrorRange] (Range of allowed isotope peak errors, Default:0,1)
       Takes into account of the error introduced by chooosing a non-monoisotopic peak for fragmentation.
       The combination of -t and -ti determins the precursor mass tolerance.
       E.g. "-t 20ppm -ti -1,2" tests abs(exp-calc-n*1.00335Da)<20ppm for n=-1, 0, 1, 2.
    [-thread NumThreads] (Number of concurrent threads to be executed, Default: Number of available cores)
    [-tda 0/1] (0: don't search decoy database (Default), 1: search decoy database)
    [-m FragmentMethodID] (0: As written in the spectrum or CID if no info (Default), 1: CID, 2: ETD, 3: HCD, 4: UVPD)
    [-inst MS2DetectorID] (0: Low-res LCQ/LTQ (Default), 1: Orbitrap/FTICR, 2: TOF, 3: Q-Exactive)
    [-e EnzymeID] (0: unspecific cleavage, 1: Trypsin (Default), 2: Chymotrypsin, 3: Lys-C, 4: Lys-N, 5: glutamyl endopeptidase, 6: Arg-C, 7: Asp-N, 8: alphaLP, 9: no cleavage)
    [-protocol ProtocolID] (0: Automatic (Default), 1: Phosphorylation, 2: iTRAQ, 3: iTRAQPhospho, 4: TMT, 5: Standard)
    [-ntt 0/1/2] (Number of Tolerable Termini, Default: 2)
       E.g. For trypsin, 0: non-tryptic, 1: semi-tryptic, 2: fully-tryptic peptides only.
    [-mod ModificationFileName] (Modification file, Default: standard amino acids with fixed C+57)
    [-minLength MinPepLength] (Minimum peptide length to consider, Default: 6)
    [-maxLength MaxPepLength] (Maximum peptide length to consider, Default: 40)
    [-minCharge MinCharge] (Minimum precursor charge to consider if charges are not specified in the spectrum file, Default: 2)
    [-maxCharge MaxCharge] (Maximum precursor charge to consider if charges are not specified in the spectrum file, Default: 3)
    [-n NumMatchesPerSpec] (Number of matches per spectrum to be reported, Default: 1)
    [-addFeatures 0/1] (0: output basic scores only (Default), 1: output additional features)
    [-ccm ChargeCarrierMass] (Mass of charge carrier, Default: mass of proton (1.00727649))
  Example (high-precision): java -Xmx3500M -jar MSGFPlus.jar -s test.mzXML -d IPI_human_3.79.fasta -t 20ppm -ti -1,2 -ntt 2 -tda 1 -o testMSGFPlus.mzid
  Example (low-precision): java -Xmx3500M -jar MSGFPlus.jar -s test.mzXML -d IPI_human_3.79.fasta -t 0.5Da,2.5Da -ntt 2 -tda 1 -o testMSGFPlus.mzid

  MzIDToTsv v9108 (26 Oct 2016)
  Usage: java -Xmx3500M -cp MSGFPlus.jar edu.ucsd.msjava.ui.MzIDToTsv
    -i MzIDPath (MS-GF+ output file (*.mzid) or directory containing mzid files)
    [-o TSVFile] (TSV output file (*.tsv) (Default: MzIDFileName.tsv))
    [-showQValue 0/1] (0: do not show Q-values, 1: show Q-values (Default))
    [-showDecoy 0/1] (0: do not show decoy PSMs (Default), 1: show decoy PSMs)
    [-showFormula 0/1] (0: do not show molecular formula (Default), 1: show molecular formula of peptides)
    [-unroll 0/1] (0: merge shared peptides (Default), 1: unroll shared peptides)

Command error:
  ps: bad -o argument 'state', supported arguments: user,group,comm,args,pid,ppid,pgid,tty,vsz,stat,rss
  [Error] Invalid value for parameter -protocol: null (must be an integer)
  [Error] Invalid value for parameter -i: TCAM@.mzid (file does not exist)

Work dir:
  /home/weronika/proteogenomics-analysis-workflow/work/af/af784ac36c35fa68e3d788dc8aeefc

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details

Also about the sudo:True should I add that in the config file under standard/docker:

standard {
    docker {
      enabled = true
      fixOwnership = true
      runOptions = "-u \$(id -u):\$(id -g)"
    }

or under slurm/docker:

  slurm {
    docker {
      enabled = true
      fixOwnership = true
      runOptions = "-u \$(id -u):\$(id -g)"
    }

or both?

glormph commented 6 years ago

Thank you, you found a bug! I am guessing you are using labelfree data, apparently if you do not specify --isobaric ... the pipeline does not pick a labelfree -protocol for MSGFPlus. I just pushed an update, try to git pull and rerun.

Two more things you can add to the ./nextflow run command that I see: --dbsnp /path/to/snp142CodingDbSnp.txt without this the pipeline will error. -resume very handy, nextflow will skip the parts that have already ran in a previous run

The slurm and standard are config profiles that nextflow uses. You can add it to both if you like. If you dont specify which one to use with -profile it will use standard. If you have to queue jobs to SLURM on a cluster you can add -profile slurm to nextflow run ....

Weronika77 commented 6 years ago

Even though I updated the code and did everything you said, I unfortunately keep running into a new error:

sudo ./nextflow run ipaw.nf --resume   --tdb VarDB.fasta   --mzmls *.mzML    --gtf VarDB.gtf   --knownproteins Homo_sapiens.GRCh38.pep.all.fa   --blastdb UniProteome+Ensembl87+refseq+GENCODE24.proteins.fasta   --snpfa MSCanProVar_ensemblV79.filtered.fasta   --genome hg19.chr1-22.X.Y.M.fa.masked    --dbsnp snp142CodingDbSnp.txt   --cosmic CosmicMutantExport.tsv   --outdir results
N E X T F L O W  ~  version 0.28.0
Launching `ipaw.nf` [intergalactic_kare] - revision: baa0e7b888
WARN: Access to undefined parameter `mzmldef` -- Initialise it to a default value eg. `params.mzmldef = some_value`
Detected setnames: NA
[warm up] executor > local
[ec/e71907] Submitted process > concatFasta
[c2/1dead3] Submitted process > makeProtSeq
[63/2d14e4] Submitted process > makeTrypSeq
[9b/388592] Submitted process > createSpectraLookup (1)
[b4/b0007d] Submitted process > makeDecoyReverseDB
[93/f9c06c] Submitted process > msgfPlus (1)
[b0/c9df8e] Submitted process > percolator (1)
[7c/ee33b3] Submitted process > filterPercolator (1)
[c2/38a4d8] Submitted process > svmToTSV (1)
[3c/3e26c9] Submitted process > svmToTSV (2)
ERROR ~ Error executing process > 'svmToTSV (1)'

Caused by:
  Process `svmToTSV (1)` terminated with an error exit status (1)

Command executed:

  #!/usr/bin/env python
  from glob import glob
  mzidtsvfns = sorted(glob('mzidtsv*'))
  mzidfns = sorted(glob('mzident*'))
  from app.readers import pycolator, xml, tsv, mzidplus
  import os
  ns = xml.get_namespace_from_top('fp_th0.xml', None) 
  psms = {p.attrib['{%s}psm_id' % ns['xmlns']]: p for p in pycolator.generate_psms('fp_th0.xml', ns)}
  decoys = {True: 0, False: 0}
  for psm in sorted([(pid, float(p.find('{%s}svm_score' % ns['xmlns']).text), p) for pid, p in psms.items()], reverse=True, key=lambda x:x[1]):
      pdecoy = psm[2].attrib['{%s}decoy' % ns['xmlns']] == 'true'
      decoys[pdecoy] += 1
      psms[psm[0]] = {'decoy': pdecoy, 'svm': psm[1], 'qval': decoys[True]/decoys[False]}  # T-TDC
  decoys = {'true': 0, 'false': 0}
  for svm, pep in sorted([(float(x.find('{%s}svm_score' % ns['xmlns']).text), x) for x in pycolator.generate_peptides('fp_th0.xml', ns)], reverse=True, key=lambda x:x[0]):
      decoys[pep.attrib['{%s}decoy' % ns['xmlns']]] += 1
      [psms[pid.text].update({'pepqval': decoys['true']/decoys['false']}) for pid in pep.find('{%s}psm_ids' % ns['xmlns'])]
  oldheader = tsv.get_tsv_header(mzidtsvfns[0])
  header = oldheader + ['percolator svm-score', 'PSM q-value', 'peptide q-value']
  with open('mzidperco', 'w') as fp:
      fp.write('\t'.join(header))
      for fnix, mzidfn in enumerate(mzidfns):
          mzns = mzidplus.get_mzid_namespace(mzidfn)
          siis = (sii for sir in mzidplus.mzid_spec_result_generator(mzidfn, mzns) for sii in sir.findall('{%s}SpectrumIdentificationItem' % mzns['xmlns']))
          for specidi, psm in zip(siis, tsv.generate_tsv_psms(mzidtsvfns[fnix], oldheader)):
              # percolator psm ID is: samplename_SII_scannr_rank_scannr_charge_rank
              print(specidi)
              print(psm)
              scan, rank = specidi.attrib['id'].replace('SII_', '').split('_')
              outpsm = {k: v for k,v in psm.items()}
              spfile = os.path.splitext(psm['#SpecFile'])[0]
              try:
                  percopsm = psms['{fn}_SII_{sc}_{rk}_{sc}_{ch}_{rk}'.format(fn=spfile, sc=scan, rk=rank, ch=psm['Charge'])]
              except KeyError:
                  continue
              if percopsm['decoy']:
                  continue
              fp.write('\n')
              outpsm.update({'percolator svm-score': percopsm['svm'], 'PSM q-value': percopsm['qval'], 'peptide q-value': percopsm['pepqval']})
              fp.write('\t'.join([str(outpsm[k]) for k in header]))

Command exit status:
  1

Command output:
  (empty)

Command error:
  ps: bad -o argument 'state', supported arguments: user,group,comm,args,pid,ppid,pgid,tty,vsz,stat,rss
  Traceback (most recent call last):
    File ".command.sh", line 13, in <module>
      psms[psm[0]] = {'decoy': pdecoy, 'svm': psm[1], 'qval': decoys[True]/decoys[False]}  # T-TDC
  ZeroDivisionError: division by zero

Work dir:
  /home/weronika/proteogenomics-analysis-workflow/work/c2/38a4d8b7dc4b2ded5920fef75fbfa8

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details
WARN: Killing pending tasks (1)

So I have updated the code with git pull and I have added the --dbsnp. I also added the sudo = true in the config files under both slurmand standard. Although I still had to use the sudo in front of the ./nextflow command. So I'm not sure if that had any effect?

And if i add the --isobaric ... I get the following error:

sudo ./nextflow run ipaw.nf --resume   --tdb VarDB.fasta   --mzmls *.mzML    --gtf VarDB.gtf   --knownproteins Homo_sapiens.GRCh38.pep.all.fa   --blastdb UniProteome+Ensembl87+refseq+GENCODE24.proteins.fasta   --snpfa MSCanProVar_ensemblV79.filtered.fasta   --genome hg19.chr1-22.X.Y.M.fa.masked    --isobaric ... --dbsnp snp142CodingDbSnp.txt   --cosmic CosmicMutantExport.tsv   --outdir results
N E X T F L O W  ~  version 0.28.0
Launching `ipaw.nf` [prickly_babbage] - revision: baa0e7b888
WARN: Access to undefined parameter `mzmldef` -- Initialise it to a default value eg. `params.mzmldef = some_value`
WARN: Access to undefined parameter `denoms` -- Initialise it to a default value eg. `params.denoms = some_value`
Detected setnames: NA
[warm up] executor > local
[b3/18ad0d] Submitted process > makeProtSeq
[c1/8c784b] Submitted process > concatFasta
[02/8feea9] Submitted process > makeTrypSeq
[14/5ac39b] Submitted process > IsobaricQuant (1)
[8d/6bb322] Submitted process > makeDecoyReverseDB
ERROR ~ Error executing process > 'IsobaricQuant (1)'

Caused by:
  Process `IsobaricQuant (1)` terminated with an error exit status (6)

Command executed:

  IsobaricAnalyzer  -type ... -in TCAM2.mzML -out "TCAM2.mzML.consensusXML" -extraction:select_activation "High-energy collision-induced dissociation" -extraction:reporter_mass_shift null -extraction:min_precursor_intensity 1.0 -extraction:keep_unannotated_precursor true -quantification:isotope_correction true

Command exit status:
  6

Command output:
  Invalid parameter values (ConversionError): Could not convert string 'null' to a double value. Aborting!

Command error:
  a7f760de4b27: Already exists
  d836c29a56fb: Already exists
  6c2ebb6634fc: Already exists
  00f810677cff: Already exists
  531ebc5af9ff: Already exists
  a3ed95caeb02: Already exists
  aef3b3b2fa0d: Already exists
  05c89845ef18: Pulling fs layer
  05c89845ef18: Verifying Checksum
  05c89845ef18: Download complete
  05c89845ef18: Pull complete
  Digest: sha256:2373b8c92a79f51a3833576b24629a3fadb0140b30fb70f9c4cfa18c6d7a3641
  Status: Downloaded newer image for quay.io/biocontainers/openms:2.2.0--py27_boost1.64_0
  ps: bad -o argument 'state', supported arguments: user,group,comm,args,pid,ppid,pgid,tty,vsz,stat,rss
  ps: bad -o argument 'state', supported arguments: user,group,comm,args,pid,ppid,pgid,tty,vsz,stat,rss
  stty: standard input: Inappropriate ioctl for device

  IsobaricAnalyzer -- Calculates isobaric quantitative values for peptides
  Version: 2.2.0 Jul 10 2017, 11:42:37, Revision: HEAD-HASH-NOTFOUND

  Usage:
    IsobaricAnalyzer <options>

  This tool has algorithm parameters that are not shown here! Please check the ini file for a detailed description or use the --helphelp option.

  Options (mandatory options marked with '*'):
    -type <mode>       Isobaric Quantitation method used in the experiment. (default: 'itraq4plex' valid: 'itraq4plex', 'itraq8plex', 'tmt10plex', 'tmt6plex')
    -in <file>*        Input raw/picked data file  (valid formats: 'mzML')
    -out <file>*       Output consensusXML file with quantitative information (valid formats: 'consensusXML')

  Common TOPP options:
    -ini <file>        Use the given TOPP INI file
    -threads <n>       Sets the number of threads allowed to be used by the TOPP tool (default: '1')
    -write_ini <file>  Writes the default configuration file
    -id_pool <file>    ID pool file to DocumentID's for all generated output files. Disabled by default. (Set to 'main' to use /usr/local/share/OpenMS/IDPool/IDPool.txt)
    --help             Shows options
    --helphelp         Shows all options (including advanced)

  The following configuration subsections are valid:
   - extraction       Parameters for the channel extraction.
   - itraq4plex       Algorithm parameters for iTRAQ 4-plex
   - itraq8plex       Algorithm parameters for iTRAQ 8-plex
   - quantification   Parameters for the peptide quantification.
   - tmt10plex        Algorithm parameters for TMT 10-plex
   - tmt6plex         Algorithm parameters for TMT 6-plex

  You can write an example INI file using the '-write_ini' option.
  Documentation of subsection parameters can be found in the doxygen documentation or the INIFileEditor.
  Have a look at the OpenMS documentation for more information.

Work dir:
  /home/weronika/proteogenomics-analysis-workflow/work/14/5ac39b0f717e9d15dc9e746ebbd049

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details
WARN: Killing pending tasks (3)
glormph commented 6 years ago

Sorry, I meant if you have isobaric data, use --isobaric tmt10plex for tmt10plex data, --isobaric itraq8plex for itraq data, etc. If you have labelfree data, the git pull should now get you a proper labelfree protocol for MSGF+. Is this labelfree data?

But you got the first error also. What I forgot to mention (and it is not in the README), is that if you have labelfree data, you will also need to specify another modification --mod your_modfile.txt. Basically you can use Mods.txt and remove the lines with tmt6plex in them. This will give better PSMs and MAYBE solve the first problem.

Weronika77 commented 6 years ago

Yes, the data is labelfree. I made a new modfile and removed the lines with tmt6plex. Now I got this error:

sudo ./nextflow run ipaw.nf -resume \

--tdb VarDB.fasta \ --mzmls TCAM2.mzML \ --gtf VarDB.gtf \ --knownproteins Homo_sapiens.GRCh38.pep.all.fa \ --blastdb UniProteome+Ensembl87+refseq+GENCODE24.proteins.fasta \ --snpfa MSCanProVar_ensemblV79.filtered.fasta \ --genome hg19.chr1-22.X.Y.M.fa.masked \ --dbsnp snp142CodingDbSnp.txt \ --cosmic CosmicMutantExport.tsv \ --mod new_mods.txt \ --outdir results [sudo] password for weronika: N E X T F L O W ~ version 0.28.0 Launching ipaw.nf [mad_yonath] - revision: baa0e7b888 WARN: Access to undefined parameter mzmldef -- Initialise it to a default value eg. params.mzmldef = some_value Detected setnames: NA [warm up] executor > local [59/a2e899] Cached process > makeProtSeq [93/28b657] Cached process > makeTrypSeq [89/aaa546] Cached process > concatFasta [33/807ea5] Cached process > createSpectraLookup (1) [66/305130] Cached process > makeDecoyReverseDB [d0/19d122] Submitted process > msgfPlus (1) ERROR ~ Error executing process > 'msgfPlus (1)'

Caused by: Missing output file(s) TCAM2.mzid expected by process msgfPlus (1)

Command executed:

msgf_plus -Xmx16G -d concatdb.fasta -s TCAM2.mzML -o "TCAM2.mzid" -thread 12 -mod Mods.txt -tda 0 -t 10.0ppm -ti -1,2 -m 0 -inst 3 -e 1 -protocol 0 -ntt 2 -minLength 7 -maxLength 50 -minCharge 2 -maxCharge 6 -n 1 -addFeatures 1 msgf_plus -Xmx3500M edu.ucsd.msjava.ui.MzIDToTsv -i "TCAM2.mzid" -o out.mzid.tsv

Command exit status: 0

Command output:

MS-GF+ Release (v2016.10.26) (26 Oct 2016) Usage: java -Xmx3500M -jar MSGFPlus.jar -s SpectrumFile (.mzML, .mzXML, .mgf, .ms2, .pkl or _dta.txt) -d DatabaseFile (.fasta or .fa) [-o OutputFile (.mzid)] (Default: [SpectrumFileName].mzid) [-t PrecursorMassTolerance] (e.g. 2.5Da, 20ppm or 0.5Da,2.5Da, Default: 20ppm) Use comma to set asymmetric values. E.g. "-t 0.5Da,2.5Da" will set 0.5Da to the minus (expMass<theoMass) and 2.5Da to plus (expMass>theoMass) [-ti IsotopeErrorRange] (Range of allowed isotope peak errors, Default:0,1) Takes into account of the error introduced by chooosing a non-monoisotopic peak for fragmentation. The combination of -t and -ti determins the precursor mass tolerance. E.g. "-t 20ppm -ti -1,2" tests abs(exp-calc-n1.00335Da)<20ppm for n=-1, 0, 1, 2. [-thread NumThreads] (Number of concurrent threads to be executed, Default: Number of available cores) [-tda 0/1] (0: don't search decoy database (Default), 1: search decoy database) [-m FragmentMethodID] (0: As written in the spectrum or CID if no info (Default), 1: CID, 2: ETD, 3: HCD, 4: UVPD) [-inst MS2DetectorID] (0: Low-res LCQ/LTQ (Default), 1: Orbitrap/FTICR, 2: TOF, 3: Q-Exactive) [-e EnzymeID] (0: unspecific cleavage, 1: Trypsin (Default), 2: Chymotrypsin, 3: Lys-C, 4: Lys-N, 5: glutamyl endopeptidase, 6: Arg-C, 7: Asp-N, 8: alphaLP, 9: no cleavage) [-protocol ProtocolID] (0: Automatic (Default), 1: Phosphorylation, 2: iTRAQ, 3: iTRAQPhospho, 4: TMT, 5: Standard) [-ntt 0/1/2] (Number of Tolerable Termini, Default: 2) E.g. For trypsin, 0: non-tryptic, 1: semi-tryptic, 2: fully-tryptic peptides only. [-mod ModificationFileName] (Modification file, Default: standard amino acids with fixed C+57) [-minLength MinPepLength] (Minimum peptide length to consider, Default: 6) [-maxLength MaxPepLength] (Maximum peptide length to consider, Default: 40) [-minCharge MinCharge] (Minimum precursor charge to consider if charges are not specified in the spectrum file, Default: 2) [-maxCharge MaxCharge] (Maximum precursor charge to consider if charges are not specified in the spectrum file, Default: 3) [-n NumMatchesPerSpec] (Number of matches per spectrum to be reported, Default: 1) [-addFeatures 0/1] (0: output basic scores only (Default), 1: output additional features) [-ccm ChargeCarrierMass] (Mass of charge carrier, Default: mass of proton (1.00727649)) Example (high-precision): java -Xmx3500M -jar MSGFPlus.jar -s test.mzXML -d IPI_human_3.79.fasta -t 20ppm -ti -1,2 -ntt 2 -tda 1 -o testMSGFPlus.mzid Example (low-precision): java -Xmx3500M -jar MSGFPlus.jar -s test.mzXML -d IPI_human_3.79.fasta -t 0.5Da,2.5Da -ntt 2 -tda 1 -o testMSGFPlus.mzid

MzIDToTsv v9108 (26 Oct 2016) Usage: java -Xmx3500M -cp MSGFPlus.jar edu.ucsd.msjava.ui.MzIDToTsv -i MzIDPath (MS-GF+ output file (.mzid) or directory containing mzid files) [-o TSVFile] (TSV output file (.tsv) (Default: MzIDFileName.tsv)) [-showQValue 0/1] (0: do not show Q-values, 1: show Q-values (Default)) [-showDecoy 0/1] (0: do not show decoy PSMs (Default), 1: show decoy PSMs) [-showFormula 0/1] (0: do not show molecular formula (Default), 1: show molecular formula of peptides) [-unroll 0/1] (0: merge shared peptides (Default), 1: unroll shared peptides)

Command error: ps: bad -o argument 'state', supported arguments: user,group,comm,args,pid,ppid,pgid,tty,vsz,stat,rss [Error] Invalid value for parameter -mod: Mods.txt (file does not exist) [Error] Invalid value for parameter -i: TCAM2.mzid (file does not exist)

Work dir: /home/weronika/proteogenomics-analysis-workflow/work/d0/19d12211d3e9d04a68785c13796d96

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

-- Check '.nextflow.log' file for details

glormph commented 6 years ago

My bad :( I typed too fast in the previous comment on this issue, sorry. Write --mods instead of --mod

Weronika77 commented 6 years ago

Still an error.. :( but I got further!

sudo ./nextflow run ipaw.nf -resume \
>   --tdb VarDB.fasta \
>   --mzmls TCAM2.mzML  \
>   --gtf VarDB.gtf \
>   --knownproteins Homo_sapiens.GRCh38.pep.all.fa \
>   --blastdb UniProteome+Ensembl87+refseq+GENCODE24.proteins.fasta \
>   --snpfa MSCanProVar_ensemblV79.filtered.fasta \
>   --genome hg19.chr1-22.X.Y.M.fa.masked  \
>   --dbsnp snp142CodingDbSnp.txt \
>   --cosmic CosmicMutantExport.tsv \
>   --mods new_mods.txt \
>   --outdir results
[sudo] password for weronika: 
N E X T F L O W  ~  version 0.28.0
Launching `ipaw.nf` [focused_perlman] - revision: baa0e7b888
WARN: Access to undefined parameter `mzmldef` -- Initialise it to a default value eg. `params.mzmldef = some_value`
Detected setnames: NA
[warm up] executor > local
[d7/571a70] Cached process > makeTrypSeq
[02/2dcc11] Cached process > concatFasta
[9f/f2edd7] Cached process > createSpectraLookup (1)
[9e/cd97b8] Cached process > makeProtSeq
[f3/fc335a] Cached process > makeDecoyReverseDB
[23/d67cec] Submitted process > msgfPlus (1)
[1f/e6ae15] Submitted process > percolator (1)
[62/9639d0] Submitted process > filterPercolator (1)
[e6/7dda40] Submitted process > svmToTSV (1)
[69/bdc780] Submitted process > svmToTSV (2)
[75/12f030] Submitted process > createPSMPeptideTable (2)
[be/cb5990] Submitted process > createPSMPeptideTable (1)
[12/f59e4a] Submitted process > createFastaBedGFF (1)
[1e/11e55d] Submitted process > prepSpectrumAI (1)
[3a/7e0764] Submitted process > mergeSetPSMtable (1)
[d3/bf04f6] Submitted process > mergeSetPSMtable (2)
[cf/238b47] Submitted process > prePeptideTable (1)
Pipeline output ready: /home/weronika/proteogenomics-analysis-workflow/work/3a/7e0764481db9563fa878c996fb8a79/variant_psmtable.txt
Pipeline output ready: /home/weronika/proteogenomics-analysis-workflow/work/d3/bf04f6a0698b0b1cb28818c93c2292/novel_psmtable.txt
[36/1feb40] Submitted process > SpectrumAI (1)
ERROR ~ Error executing process > 'prePeptideTable (1)'

Caused by:
  Process `prePeptideTable (1)` terminated with an error exit status (127)

Command executed:

  null

Command exit status:
  127

Command output:
  (empty)

Command error:
  .command.sh: line 2: null: command not found
  ps: bad -o argument 'state', supported arguments: user,group,comm,args,pid,ppid,pgid,tty,vsz,stat,rss

Work dir:
  /home/weronika/proteogenomics-analysis-workflow/work/cf/238b479b7ac48b71f6dc5f10d6f5dd

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

 -- Check '.nextflow.log' file for details
WARN: Killing pending tasks (2)
glormph commented 6 years ago

Another bug! I pushed a fix, git pull and try again.

By the way, thank you very much for your patience, our pipeline is getting much better letting other labs with other data test it!

Weronika77 commented 6 years ago

No problem at all. Glad I can help and it's great that you are quick with fixes :) We're not there yet though.. Everything same as above, except for the error:

ERROR ~ Error executing process > 'prePeptideTable (1)'

Caused by:
  Process `prePeptideTable (1)` terminated with an error exit status (1)

Command executed:

  msspsmtable merge -o psms.txt -i psms* 
  msspeptable psm2pep -i psms.txt -o preisoquant --scorecolpattern svm --spectracol 1 --isobquantcolpattern plex
  awk -F '\t' 'BEGIN {OFS = FS} {print $12,$13,$3,$7,$8,$9,$11,$14,$15,$16,$17,$18,$19,$20,$21,$22}' preisoquant > preordered
  mv preordered peptidetable.txt

Command exit status:
  1

Command output:
  (empty)

Command error:
  ps: bad -o argument 'state', supported arguments: user,group,comm,args,pid,ppid,pgid,tty,vsz,stat,rss
  ps: bad -o argument 'state', supported arguments: user,group,comm,args,pid,ppid,pgid,tty,vsz,stat,rss
  Traceback (most recent call last):
    File "/usr/local/bin/msspeptable", line 6, in <module>
      sys.exit(app.peptable.main())
    File "/usr/local/lib/python3.6/site-packages/app/peptable.py", line 14, in main
      startup.start_msstitch(drivers, sys.argv)
    File "/usr/local/lib/python3.6/site-packages/app/drivers/startup.py", line 53, in start_msstitch
      args.func(**vars(args))
    File "/usr/local/lib/python3.6/site-packages/app/drivers/base.py", line 74, in start
      self.run()
    File "/usr/local/lib/python3.6/site-packages/app/drivers/pepprottable.py", line 15, in run
      self.create_header()
    File "/usr/local/lib/python3.6/site-packages/app/drivers/peptable/psmtopeptable.py", line 47, in create_header
      self.precurquantcol)
    File "/usr/local/lib/python3.6/site-packages/app/actions/headers/peptable.py", line 26, in get_psm2pep_header
      isocols = tsv.get_columns_by_pattern(header, isobq_pattern)
    File "/usr/local/lib/python3.6/site-packages/app/readers/tsv.py", line 148, in get_columns_by_pattern
      'pattern: {}'.format(pattern))
  RuntimeError: Could not find fieldname in header with pattern: plex

Work dir:
  /home/weronika/proteogenomics-analysis-workflow/work/8b/6e6f872e55bd39bac8d07a5638ead2

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details
WARN: Killing pending tasks (2)
glormph commented 6 years ago

I applied another fix, pull and retry. I wonder how many times we have left.

Weronika77 commented 6 years ago

Haha, me too. Another one:

[58/082c70] Submitted process > prePeptideTable (1)
[2e/8e7964] Submitted process > SpectrumAI (1)
[7b/e101af] Submitted process > mapVariantPeptidesToGenome (1)
[52/9acbeb] Submitted process > annovar (1)
[1d/175104] Submitted process > BlastPNovel (1)
[2c/3901db] Submitted process > phyloCSF (1)
[40/b711e8] Submitted process > BLATNovel (1)
[64/e764af] Submitted process > labelnsSNP (1)
[bf/678ee0] Submitted process > phastcons (1)
ERROR ~ Error executing process > 'BlastPNovel (1)'

Caused by:
  Process `BlastPNovel (1)` terminated with an error exit status (1)

Command executed:

  makeblastdb -in UniProteome+Ensembl87+refseq+GENCODE24.proteins.fasta -dbtype prot
  blastp -db UniProteome+Ensembl87+refseq+GENCODE24.proteins.fasta -query novel_peptides.fa -outfmt '6 qseqid sseqid pident qlen slen qstart qend sstart send mismatch positive gapopen gaps qseq sseq evalue bitscore' -num_threads 8 -max_target_seqs 1 -evalue 1000 -out blastp_out.txt

Command exit status:
  1

Command output:

  Building a new DB, current time: 04/11/2018 12:57:30
  New DB name:   UniProteome+Ensembl87+refseq+GENCODE24.proteins.fasta
  New DB title:  UniProteome+Ensembl87+refseq+GENCODE24.proteins.fasta
  Sequence type: Protein
  Keep MBits: T
  Maximum file size: 1000000000B
  Adding sequences from FASTA; added 99970 sequences in 14.2546 seconds.

Command error:
  Unable to find image 'quay.io/biocontainers/blast:2.7.1--boost1.64_1' locally
  2.7.1--boost1.64_1: Pulling from biocontainers/blast
  a3ed95caeb02: Already exists
  4c1fa756c345: Already exists
  a7f760de4b27: Already exists
  d836c29a56fb: Already exists
  6c2ebb6634fc: Already exists
  00f810677cff: Already exists
  531ebc5af9ff: Already exists
  a3ed95caeb02: Already exists
  aef3b3b2fa0d: Already exists
  4cde73d2600f: Pulling fs layer
  4cde73d2600f: Verifying Checksum
  4cde73d2600f: Download complete
  4cde73d2600f: Pull complete
  Digest: sha256:2d118be6f6da0232af8420b05019c0d24b0c996c576d8a0c4b1700de1ff61b22
  Status: Downloaded newer image for quay.io/biocontainers/blast:2.7.1--boost1.64_1
  ps: bad -o argument 'state', supported arguments: user,group,comm,args,pid,ppid,pgid,tty,vsz,stat,rss
  ps: bad -o argument 'state', supported arguments: user,group,comm,args,pid,ppid,pgid,tty,vsz,stat,rss
  USAGE
    blastp [-h] [-help] [-import_search_strategy filename]
      [-export_search_strategy filename] [-task task_name] [-db database_name]
      [-dbsize num_letters] [-gilist filename] [-seqidlist filename]
      [-negative_gilist filename] [-negative_seqidlist filename]
      [-entrez_query entrez_query] [-db_soft_mask filtering_algorithm]
      [-db_hard_mask filtering_algorithm] [-subject subject_input_file]
      [-subject_loc range] [-query input_file] [-out output_file]
      [-evalue evalue] [-word_size int_value] [-gapopen open_penalty]
      [-gapextend extend_penalty] [-qcov_hsp_perc float_value]
      [-max_hsps int_value] [-xdrop_ungap float_value] [-xdrop_gap float_value]
      [-xdrop_gap_final float_value] [-searchsp int_value]
      [-sum_stats bool_value] [-seg SEG_options] [-soft_masking soft_masking]
      [-matrix matrix_name] [-threshold float_value] [-culling_limit int_value]
      [-best_hit_overhang float_value] [-best_hit_score_edge float_value]
      [-window_size int_value] [-lcase_masking] [-query_loc range]
      [-parse_deflines] [-outfmt format] [-show_gis]
      [-num_descriptions int_value] [-num_alignments int_value]
      [-line_length line_length] [-html] [-max_target_seqs num_sequences]
      [-num_threads int_value] [-ungapped] [-remote] [-comp_based_stats compo]
      [-use_sw_tback] [-version]

  DESCRIPTION
     Protein-Protein BLAST 2.7.1+

  Use '-help' to print detailed descriptions of command line arguments
  ========================================================================

  Error: Argument "num_threads". Illegal value, expected (>=1 and =<4):  `8'
  Error:  (CArgException::eConstraint) Argument "num_threads". Illegal value, expected (>=1 and =<4):  `8'

Work dir:
  /home/weronika/proteogenomics-analysis-workflow/work/1d/1751043c909078d5232b3fb70422de

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details
WARN: Killing pending tasks (4)
yafeng commented 6 years ago

Ok, the error occurs because the specified num_threads 8 exceed the available cores (>=1 and =<4). You can do a quick fix by changing in the ipaw.nf script, change -num_threads 8 to -num_threads 4.

We will try to add an option in the command line later.

Weronika77 commented 6 years ago

Unfortunately, I got another one:

[e4/95376d] Submitted process > mapVariantPeptidesToGenome (1)
[4f/503bbd] Submitted process > annovar (1)
[58/6597bd] Submitted process > BlastPNovel (1)
[9a/fb3687] Submitted process > phyloCSF (1)
[18/589540] Submitted process > phastcons (1)
[c7/184f34] Submitted process > labelnsSNP (1)
[e7/1bdf4d] Submitted process > BLATNovel (1)
[6d/abddd9] Submitted process > parseAnnovarOut (1)
[0b/16a2aa] Submitted process > ParseBlastpOut (1)
[37/70ae10] Submitted process > ValidateSingleMismatchNovpeps (1)
[af/3b6a89] Submitted process > novpepSpecAIOutParse (1)
ERROR ~ Error executing process > 'mapVariantPeptidesToGenome (1)'

Caused by:
  Process `mapVariantPeptidesToGenome (1)` terminated with an error exit status (1)

Command executed:

  python3 /pgpython/parse_spectrumAI_out.py --spectrumAI_out NA_variant_specairesult.txt --input peptidetable.txt --output NA_variant_peptides.txt
  python3 /pgpython/map_cosmic_snp_tohg19.py --input NA_variant_peptides.txt --output NA_variant_peptides.saav.pep.hg19cor.vcf --cosmic_input CosmicMutantExport.tsv --dbsnp_input snp142CodingDbSnp.txt

Command exit status:
  1

Command output:
  protein id PGOHUM00000239752_ORF3(pre=K,post=N);PGOHUM_ENST00000547505.2_EEF1A1P17_ORF3(pre=K,post=N) can't be mapped
  protein id PGOHUM00000237869_ORF1(pre=K,post=C);PGOHUM00000238279_ORF2(pre=K,post=C);PGOHUM_ENST00000437653.1_RP11-809F4.2_ORF1(pre=K,post=C);PGOHUM_ENST00000420887.2_RP11-259P15.3_ORF3(pre=K,post=C);lnc-TBL1XR1-16:1_ORF1(pre=K,post=C);lnc-FXR1-2:1_ORF3(pre=K,post=C) can't be mapped
  protein id PGOHUM00000241423_ORF1(pre=K,post=G);PGOHUM_ENST00000394563.4_RP6-145B8.3_ORF3(pre=K,post=G) can't be mapped
  protein id PGOHUM00000247287_ORF3(pre=R,post=I);PGOHUM_ENST00000559111.1_HSP90B2P_ORF2(pre=R,post=I) can't be mapped
  protein id lnc-NKX2-4-5:2_ORF2(pre=-,post=R) can't be mapped
  protein id PGOHUM00000250761_ORF3(pre=K,post=Q);PGOHUM_ENST00000507090.2_HSP90AB2P_ORF1(pre=K,post=Q);lnc-CPEB2-12:1_ORF3(pre=K,post=Q) can't be mapped
  protein id PGOHUM00000242888_ORF1(pre=K,post=H);PGOHUM_ENST00000526512.1_RP11-382M14.1_ORF2(pre=K,post=H) can't be mapped
  protein id PGOHUM00000240639_ORF3(pre=R,post=S);PGOHUM_ENST00000415103.1_AC078994.2_ORF1(pre=R,post=S) can't be mapped
  protein id lnc-SLITRK1-4:1_ORF2(pre=R,post=L) can't be mapped
  protein id PGOHUM00000241261_ORF1(pre=K,post=S);PGOHUM_ENST00000417985.1_ACTBP1_ORF1(pre=K,post=S) can't be mapped
  protein id PGOHUM00000247755_ORF2(pre=Kped
  protein id PGOHUM00000238102_ORF3(pre=R,post=H);PGOHUM_ENST00000471403.1_CDV3P1_ORF3(pre=R,post=H) can't be mapped
  protein id PGOHUM00000236730_ORF3(pre=R,post=A);PGOHUM_ENST00000400056.3_KRT18P13_ORF3(pre=R,post=A) can't be mapped
  protein id PGOHUM00000244156_ORF2(pre=K,post=L);PGOHUM_ENST00000419696.1_GAPDHP64_ORF2(pre=K,post=L) can't be mapped
  protein id PGOHUM00000245611_ORF3(pre=K,post=K);PGOHUM_ENST00000512920.2_NPM1P41_ORF1(pre=K,post=K);lnc-HNRNPD-4:1_ORF1(pre=K,post=K) can't be mapped
  protein id PGOHUM_ENST00000412323.1_ATP5A1P2_ORF1(pre=R,post=A) can't be mapped
  protein id lnc-AKAP14-1:2_ORF2(pre=R,post=Q);lnc-AKAP14-1:1_ORF2(pre=R,post=Q);lnc-AKAP14-1:3_ORF1(pre=R,post=Q) can't be mapped
  protein id lnc-AKAP14-1:1_ORF2(pre=R,post=G);lnc-AKAP14-1:3_ORF1(pre=R,post=G) can't be mapped
  protein id lncRNA_ENST00000597346.1_ORF1(pre=K,post=Q);lncRNA_ENST00000561320.1_ORF3(pre=K,post=Q);lncRNA_ENST00000585816.1_ORF1(pre=K,post=Q);lnc-CDKN1C-3:4_ORF2(pre=K,post=Q);lnc-ABHD12-4:2_ORF3(pre=K,post=Q);lnc-CCZ1B-6:2_ORF1(pre=K,post=Q);lnc-ABHD12-4:1_ORF2(pre=K,post=Q);lnc-RGR-2:1_ORF1(pre=K,post=Q);lnc-CDKN1C-3:5_ORF1(pre=K,post=Q);lnc-ZNF682-3:4_ORF1(pre=K,post=Q);lnc-CDKN1C-3:8_ORF3(pre=K,post=Q);lnc-AADAT-9:1_ORF3(pre=K,post=Q);lnc-RASGRP1-3:2_ORF3(pre=K,post=Q);lnc-C3orf79-9:1_ORF2(pre=K,post=Q) can't be mapped
  protein id PGOHUM_ENST00000378770.1_HSP90AA4P_ORF2(pre=K,post=I) can't be mapped
  protein id PGOHUM00000239752_ORF3(pre=K,post=N);PGOHUM_ENST00000547505.2_EEF1A1P17_ORF3(pre=K,post=N) can't be mapped
  protein id PGOHUM00000237869_ORF1(pre=K,post=C);PGOHUM00000238279_ORF2(pre=K,post=C);PGOHUM_ENST00000437653.1_RP11-809F4.2_ORF1(pre=K,post=C);PGOHUM_ENST00000420887.2_RP11-259P15.3_ORF3(pre=K,post=C);lnc-TBL1XR1-16:1_ORF1(pre=K,post=C);lnc-FXR1-2:1_ORF3(pre=K,post=C) can't be mapped
  protein id PGOHUM00000241423_ORF1(pre=K,post=G);PGOHUM_ENST00000394563.4_RP6-145B8.3_ORF3(pre=K,post=G) can't be mapped
  protein id PGOHUM00000247287_ORF3(pre=R,post=I);PGOHUM_ENST00000559111.1_HSP90B2P_ORF2(pre=R,post=I) can't be mapped
  protein id lnc-NKX2-4-5:2_ORF2(pre=-,post=R) can't be mapped
  protein id PGOHUM00000250761_ORF3(pre=K,post=Q);PGOHUM_ENST00000507090.2_HSP90AB2P_ORF1(pre=K,post=Q);lnc-CPEB2-12:1_ORF3(pre=K,post=Q) can't be mapped
  protein id PGOHUM00000242888_ORF1(pre=K,post=H);PGOHUM_ENST00000526512.1_RP11-382M14.1_ORF2(pre=K,post=H) can't be mapped
  protein id PGOHUM00000240639_ORF3(pre=R,post=S);PGOHUM_ENST00000415103.1_AC078994.2_ORF1(pre=R,post=S) can't be mapped
  protein id lnc-SLITRK1-4:1_ORF2(pre=R,post=L) can't be mapped
  protein id PGOHUM00000241261_ORF1(pre=K,post=S);PGOHUM_ENST00000417985.1_ACTBP1_ORF1(pre=K,post=S) can't be mapped
  protein id PGOHUM00000247755_ORF2(pre=K,post=D);PGOHUM_ENST00000557130.1_UBE2CP1_ORF2(pre=K,post=D);lnc-STRN3-12:1_ORF2(pre=K,post=D) can't be mapped
  protein id PGOHUM00000257036_ORF1(pre=K,post=N);PGOHUM_ENST00000440317.1_YWHAZP2_ORF1(pre=K,post=N) can't be mapped
  protein id PGOHUM00000247098_ORF3(pre=R,post=N);PGOHUM_ENST00000569826.2_RP11-265N6.3_ORF1(pre=R,post=N) can't be mapped
  protein id PGOHUM00000246771_ORF3(pre=R,post=C);PGOHUM_ENST00000418351.1_ACTBP7_ORF3(pre=R,post=C) can't be mapped
  protein id PGOHUM00000259879_ORF1(pre=R,post=G);PGOHUM_ENST00000438353.1_HSP90AA5P_ORF1(pre=R,post=G) can't be mapped
  protein id PGOHUM00000235293_ORF1(pre=K,post=K);PGOHUM_ENST00000515379.1_HNRNPA1P12_ORF1(pre=K,post=K) can't be mapped
  protein id PGOHUM_ENST00000425843.1_HSPA8P1_ORF1(pre=R,post=S) can't be mapped
  protein id PGOHUM00000233123_ORF2(pre=R,post=L);PGOHUM_ENST00000434621.2_GAPDHP68_ORF1(pre=R,post=L) can't be mapped
  protein id PGOHUM00000243082_ORF1(pre=K,post=D);PGOHUM_ENST00000362070.3_HIST1H2APS4_ORF1(pre=K,post=D) can't be mapped
  protein id PGOHUM00000241051_ORF1(pre=R,post=G);PGOHUM_ENST00000453073.1_HNRNPA1P47_ORF2(pre=R,post=G) can't be mapped
  protein id PGOHUM00000237244_ORF1(pre=K,post=C);PGOHUM00000235394_ORF2(pre=K,post=C);PGOHUM00000241322_ORF2(pre=K,post=C);PGOHUM_ENST00000423783.1_AC055811.5_ORF3(pre=K,post=C);PGOHUM_ENST00000428275.1_ACTG1P10_ORF2(pre=K,post=C);lnc-KDM5C-3:1_ORF2(pre=K,post=C) can't be mapped
  protein id PGOHUM00000238365_ORF1(pre=K,post=H);PGOHUM_ENST00000415473.1_PPIAP30_ORF1(pre=K,post=H) can't be mapped
  protein id PGOHUM00000241240_ORF1(pre=K,post=T);PGOHUM_ENST00000452570.1_GAPDHP1_ORF1(pre=K,post=T) can't be mapped
  protein id PGOHUM00000248221_ORF3(pre=R,post=T);PGOHUM_ENST00000557241.1_KRT18P7_ORF1(pre=R,post=T);lnc-TTC9-3:1_ORF1(pre=R,post=T) can't be mapped
  protein id PGOHUM00000235394_ORF2(pre=K,post=S) can't be mapped
  protein id PGOHUM00000246938_ORF1(pre=K,post=T);PGOHUM_ENST00000471472.2_RPL7P5_ORF1(pre=K,post=T) can't be mapped
  protein id lnc-SENP6-10:1_ORF1(pre=R,post=G) can't be mapped
  protein id PGOHUM00000248919_ORF3(pre=R,post=F);PGOHUM_ENST00000566277.2_CTD-2033A16.2_ORF3(pre=R,post=F) can't be mapped
  protein id PGOHUM00000243288_ORF2(pre=R,post=C);PGOHUM_ENST00000403258.1_ACTBP8_ORF2(pre=R,post=C) can't be mapped
  protein id PGOHUM00000248919_ORF3(pre=R,post=F);PGOHUM_ENST00000566277.2_CTD-2033A16.2_ORF3(pre=R,post=F) can't be mapped

Command error:
  Traceback (most recent call last):
    File "/pgpython/map_cosmic_snp_tohg19.py", line 105, in <module>
      chr_position=cosmic_dic[cosmic_id][0]
  KeyError: 'COSMIC:HSP90AA1:ENST00000334701:c.2494G>T:p.A832S'
  .command.stub: line 99:    12 Terminated              nxf_trace "$pid" .command.trace

Work dir:
  /home/weronika/proteogenomics-analysis-workflow/work/e4/95376d80c1505ace2b027bab4168d3

Tip: when you have fixed the problem you can continue the execution appending to the nextflow command line the option `-resume`

 -- Check '.nextflow.log' file for details
WARN: Killing pending tasks (1)
yafeng commented 6 years ago

Can you do one check for me? I just want to make sure if you have the correct version of COSMIC file downloaded. try the following command and paste the results here. grep ENST00000334701 CosmicMutantExport.tsv | grep c.2494G>T

Weronika77 commented 6 years ago

This is in the file: HSP90AA1 ENST00000334701 2565 5253 TCGA-CM-6171-01 1651233 1566020 large_intestine colon ascending NS carcinoma adenocarcinoma NS NS y COSM1368331 c.2494G>T p.A832S Substitution - Missense u 37 14:102548120-102548120 - n NEUTRAL .36228 Confirmed somatic variant 376 NS NS 77

yafeng commented 6 years ago

Here are the content in the file CosmicMutantExport.tsv stored in our server. HSP90AA1 ENST00000334701 2565 5253 TCGA-CM-6171-01 1651233 1566020 large_intestine colon carcinoma adenocarcinoma y 1368331 c.2494G>T p.A832S Substitution - Missense het 14:102548120-102548120 - n PASSENGER/OTHER Variant of unknown origin 376 NS NS 77 Stage:I

I think you have a different version of this file. please try this step again to make sure you have COSMIC v71 downloaded.

Get the COSMIC database sftp 'your_email_address@example.com'@sftp-cancer.sanger.ac.uk Download the data (NB version 71 currently works with the mapping script) sftp> get cosmic/grch37/cosmic/v71/CosmicMutantExport.tsv.gz sftp> exit Extract COSMIC data tar xvfz CosmicMutantExport.tsv.gz

Weronika77 commented 6 years ago
Connected to sftp-cancer.sanger.ac.uk.
sftp> get cosmic/grch37/cosmic/v71/CosmicMutantExport.tsv.gz
File "/cosmic/grch37/cosmic/v71/CosmicMutantExport.tsv.gz" not found.

v71 doesn't exist anymore:

sftp> cd cosmic/grch37/cosmic
sftp> ls
v72  v73  v74  v75  v76  v77  v78  v79  v80  v81  v82  v83  v84  
yafeng commented 6 years ago

OK, it seems they have taken down the old version. We will update our pipeline to fit the newer database format then. It should be quick to do, I will let you know when it is done.

Weronika77 commented 6 years ago

Thanks, I will wait patiently ;)

yafeng commented 6 years ago

Hi, @Weronika77
I have updated the script map_cosmic_snp_tohg19.py that causes the error. It fits the latest formatting of cosmic v84 now. so try to download the cosmic file from v84 and rerun the pipeline.

@glormph the ipaw.nf script should be able to call the latest script map_cosmic_snp_tohg19.py from github if there is new commit push, right?

Weronika77 commented 6 years ago

I'm not sure if it should work already, but with the new git pull, I still get the same error with cosmic v84.

glormph commented 6 years ago

The pgpython container needs to be updated when the scripts get updated. I have updated the docker container files so you dont have to re-download the bigwig files. As follows:

cd dockerfiles
docker tag pgpython pgpython_bigwigs  # to adhere to the new way to create containers
docker build -f pgpython_Dockerfile -t pgpython . 
cd ..

Hope I havent forgotten anything.

glormph commented 6 years ago

I have just tested that script and discovered that the update will not work. We probably need to hand out the COSMIC v71 data (which needs to match the VarDB search database). Continue tomorrow!

glormph commented 6 years ago

We've updated the docker container for the COSMIC peptide mapping. The following should fix that problem:

cd dockerfiles
docker build -f pgpython_Dockerfile -t pgpython . 
cd ..
Weronika77 commented 6 years ago

It finally works! I ran it with no errors. Thank you very much for all your help and patience.

glormph commented 6 years ago

Thank you very much yourself for making the pipeline better!

Jokendo-collab commented 5 years ago

I have a label free data which I am trying to analyse using the IPAW pipeline but I am getting an error which I need help for. See below the error message: N E X T F L O W ~ version 18.10.1 Launching ipaw.nf [gloomy_rubens] - revision: c0cfffc9a5 WARN: Access to undefined parameter pisepdb -- Initialise it to a default value eg. params.pisepdb = some_value WARN: Access to undefined parameter mzmldef -- Initialise it to a default value eg. params.mzmldef = some_value Detected setnames: NA [warm up] executor > local WARN: Input tuple does not match input set cardinality declared by process splitSetNormalSearchPsms -- offending value: NA [9e/016ff1] Submitted process > concatFasta (1) [17/a04b64] Submitted process > makeProtSeq [dc/a05a49] Submitted process > makeTrypSeq [29/0a072c] Submitted process > createSpectraLookup (1) ERROR ~ Error executing process > 'makeTrypSeq'

Caused by: Process makeTrypSeq terminated with an error exit status (127)

Command executed:

msslookup seqspace -i Homo_sapiens.GRCh38.pep.all.fa --insourcefrag

Command exit status: 127

Command output: (empty)

Command error: .command.sh: line 2: msslookup: command not found

Work dir: /home/javan/Desktop/proteogenomics/work/dc/a05a49b6c6f3202cd7ba13cd34943c

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

-- Check '.nextflow.log' file for details WARN: Killing pending tasks (3)

yafeng commented 5 years ago

@javanOkendo Command error: .command.sh: line 2: msslookup: command not found

It seems docker hasn;t successfully find container 'quay.io/biocontainers/msstitch:2.5--py36_0' where mssloopup command is. When you installed docker, was there any error? Can you type this command docker container ls -a to show all the containers created.

Jokendo-collab commented 5 years ago

I have a label free data which I am trying to analyse using the IPAW pipeline but I am getting an error which I need help for. See below the error message: N E X T F L O W ~ version 18.10.1 Launching ipaw.nf [gloomy_rubens] - revision: c0cfffc9a5 WARN: Access to undefined parameter pisepdb -- Initialise it to a default value eg. params.pisepdb = some_value WARN: Access to undefined parameter mzmldef -- Initialise it to a default value eg. params.mzmldef = some_value Detected setnames: NA [warm up] executor > local WARN: Input tuple does not match input set cardinality declared by process splitSetNormalSearchPsms -- offending value: NA [9e/016ff1] Submitted process > concatFasta (1) [17/a04b64] Submitted process > makeProtSeq [dc/a05a49] Submitted process > makeTrypSeq [29/0a072c] Submitted process > createSpectraLookup (1) ERROR ~ Error executing process > 'makeTrypSeq'

Caused by: Process makeTrypSeq terminated with an error exit status (127)

Command executed:

msslookup seqspace -i Homo_sapiens.GRCh38.pep.all.fa --insourcefrag

Command exit status: 127

Command output: (empty)

Command error: .command.sh: line 2: msslookup: command not found

Work dir: /home/javan/Desktop/proteogenomics/work/dc/a05a49b6c6f3202cd7ba13cd34943c

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

-- Check '.nextflow.log' file for details WARN: Killing pending tasks (3)

Jokendo-collab commented 5 years ago

Hi @yafeng these are the containers i created: docker container ls -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES b7b2678f8467 hello-world "/hello" 3 days ago Exited (0) 3 days ago jovial_hypatia

Jokendo-collab commented 5 years ago

docker container ls -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES b7b2678f8467 hello-world "/hello" 3 days ago Exited (0) 3 days ago jovial_hypatia

On Sun, Dec 9, 2018 at 11:15 AM yafeng notifications@github.com wrote:

@javanOkendo https://github.com/javanOkendo Command error: .command.sh: line 2: msslookup: command not found

It seems docker hasn;t successfully find container ' quay.io/biocontainers/msstitch:2.5--py36_0' where mssloopup command is. When you installed docker, was there any error? Can you type this command docker container ls -a to show all the containers created.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lehtiolab/proteogenomics-analysis-workflow/issues/2#issuecomment-445522124, or mute the thread https://github.com/notifications/unsubscribe-auth/AZO-O5vNzWcXNW3pIthtwwcBPcq8sMH2ks5u3NSxgaJpZM4S7XaU .

yafeng commented 5 years ago

@javanOkendo Docker has not built any container that are required to run this workflow. Did you follow our manual in the README under "Prepare once"

Jokendo-collab commented 5 years ago

@yafeng I did follow all the instructions and see below my containers. See below Sending build context to Docker daemon 479.7MB Step 1/6 : FROM cpanse/protviz ---> 17fb90fc3008 Step 2/6 : COPY spectrumAI_R_requirements.R /tmp/requirements.R ---> Using cache ---> f2993b19c7d9 Step 3/6 : RUN apt-get update && apt-get install -y libnetcdf-dev ---> Using cache ---> cd661c50fe62 Step 4/6 : RUN Rscript /tmp/requirements.R ---> Using cache ---> 843609fb645c Step 5/6 : RUN git clone https://github.com/yafeng/SpectrumAI /SpectrumAI ---> Using cache ---> 9857e7bd22eb Step 6/6 : RUN cd /SpectrumAI && git pull && git reset --hard d9fc290cd76a5ec09aa17c03a380ad09cbce2387 ---> Using cache ---> e9a4b758c967 Successfully built e9a4b758c967 Successfully tagged spectrumai:latest

Sending build context to Docker daemon 479.7MB Step 1/2 : FROM perl ---> 3e590895f3b8 Step 2/2 : COPY annovar /annovar ---> Using cache ---> eae1d6322280 Successfully built eae1d6322280 Successfully tagged annovar:latest

Sending build context to Docker daemon 479.7MB Step 1/6 : FROM pgpython_bigwigs ---> d9cba8baf5d1 Step 2/6 : RUN apt-get update ---> Using cache ---> a7be4baaa108 Step 3/6 : RUN apt-get install -y python3-pip python3-dev libcurl3-dev ---> Using cache ---> cd328fe890b2 Step 4/6 : RUN pip3 install pyBigWig pysam ---> Using cache ---> c0bdcba73ca8 Step 5/6 : RUN rm -r /pgpython; git clone https://github.com/yafeng/proteogenomics_python /pgpython ---> Using cache ---> 8006cc7c8cb3 Step 6/6 : RUN cd /pgpython && git pull && git reset --hard 7c2cf3ac5d6a1f7f15dd9019438a3a4332d30c26 ---> Using cache ---> f88e3b440757 Successfully built f88e3b440757 Successfully tagged pgpython:latest

yafeng commented 5 years ago

@javanOkendo try docker images command see if the built images are there, and also check your disk space where the docker images were built. And type the docker version

Jokendo-collab commented 5 years ago

@yafeng the images are okay. It runned for a few minutes and I got this error again. N E X T F L O W ~ version 18.10.1 Launching ipaw.nf [thirsty_mcclintock] - revision: b59e6478ab WARN: Access to undefined parameter pisepdb -- Initialise it to a default value eg. params.pisepdb = some_value WARN: Access to undefined parameter mzmldef -- Initialise it to a default value eg. params.mzmldef = some_value Detected setnames: NA [warm up] executor > local WARN: Input tuple does not match input set cardinality declared by process splitSetNormalSearchPsms -- offending value: NA [22/acbbba] Submitted process > concatFasta (1) [7e/b3fba1] Submitted process > makeProtSeq [6b/0a709b] Submitted process > makeTrypSeq [42/583eff] Submitted process > createSpectraLookup (1) [65/ff33e2] Submitted process > makeDecoyReverseDB (1) [a8/669766] Submitted process > msgfPlus (1) [aa/b059b9] Submitted process > msgfPlus (3) [9f/3217d7] Submitted process > msgfPlus (2) [b4/b4fbc2] Submitted process > msgfPlus (4) [4f/90d851] Submitted process > msgfPlus (5) ERROR ~ Error executing process > 'createSpectraLookup (1)'

Caused by: Process createSpectraLookup (1) terminated with an error exit status (1)

Command executed:

msslookup spectra -i 170909_CH_C1_T006_FT.mzML 170909_CH_C1_T007_FT_R2.mzML 170909_CH_C1_T009_FT.mzML 170909_CH_C1_T010_FT_R2.mzML 170909_CH_C1_T011_FT_R2.mzML 170909_CH_C1_T013_FT.mzML 170909_CH_C1_T014_FT_R2.mzML 170909_CH_C1_T015_FT.mzML 170909_CH_C1_T025_FT.mzML 170909_CH_C1_T026_FT.mzML 170909_CH_C1_T027_FT_R2.mzML 170909_CH_C1_T037_FT.mzML 170909_CH_C1_T042_FT.mzML 170909_CH_C1_T049_FT.mzML 170909_CH_C1_T051_FT_R2.mzML 170909_CH_C1_T052_FT.mzML 170909_CH_C1_T053_FT.mzML 170909_CH_C1_T054_FT.mzML 170909_CH_C1_T061_FT_R2.mzML 170909_CH_C1_T062_FT_R2.mzML 170909_CH_C1_T064_FT_R2.mzML 170909_CH_C1_T069_FT_R2.mzML 170909_CH_C1_T073_FT_R2.mzML 170909_CH_C1_T075_FT.mzML 170909_CH_C1_T077_FT.mzML 170909_CH_C1_T080_FT.mzML 170909_CH_C1_T084_FT_R2.mzML --setnames NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

Command exit status: 1

Command output: (empty)

Command error: Traceback (most recent call last): File "/usr/local/bin/msslookup", line 6, in sys.exit(app.mslookup.main()) File "/usr/local/lib/python3.6/site-packages/app/mslookup.py", line 21, in main startup.start_msstitch(drivers, sys.argv) File "/usr/local/lib/python3.6/site-packages/app/drivers/startup.py", line 53, in start_msstitch args.func(**vars(args)) File "/usr/local/lib/python3.6/site-packages/app/drivers/base.py", line 74, in start self.run() File "/usr/local/lib/python3.6/site-packages/app/drivers/mslookup/base.py", line 35, in run self.create_lookup() File "/usr/local/lib/python3.6/site-packages/app/drivers/mslookup/spectra.py", line 41, in create_lookup spectralookup.create_spectra_lookup(self.lookup, fn_spectra) File "/usr/local/lib/python3.6/site-packages/app/actions/mslookup/spectra.py", line 8, in create_spectra_lookup for fn, spectrum in fn_spectra: File "/usr/local/lib/python3.6/site-packages/app/readers/spectra.py", line 8, in mzmlfn_ms2_spectra_generator for fn, spec, ns in mzmlfn_spectra_generator(mzmlfiles): File "/usr/local/lib/python3.6/site-packages/app/readers/spectra.py", line 32, in mzmlfn_spectra_generator for spectrum in spectra: File "/usr/local/lib/python3.6/site-packages/app/readers/xml.py", line 60, in generate_xmltags for ac, el in etree.iterparse(fn): File "src/lxml/iterparse.pxi", line 208, in lxml.etree.iterparse.next File "src/lxml/iterparse.pxi", line 193, in lxml.etree.iterparse.next File "src/lxml/iterparse.pxi", line 224, in lxml.etree.iterparse._read_more_events File "src/lxml/parser.pxi", line 1397, in lxml.etree._FeedParser.close File "src/lxml/parser.pxi", line 589, in lxml.etree._ParserContext._handleParseResult File "src/lxml/parser.pxi", line 598, in lxml.etree._ParserContext._handleParseResultDoc File "src/lxml/parser.pxi", line 709, in lxml.etree._handleParseResult File "src/lxml/parser.pxi", line 638, in lxml.etree._raiseParseError File "170909_CH_C1_T053_FT.mzML", line 9017 lxml.etree.XMLSyntaxError: Extra content at the end of the document, line 9017, column 33946 chown: unrecognized option '--from' BusyBox v1.22.1 (2014-05-23 01:24:27 UTC) multi-call binary.

Usage: chown [-RhLHPcvf]... OWNER[<.|:>[GROUP]] FILE...

Change the owner and/or group of each FILE to OWNER and/or GROUP

-R  Recurse
-h  Affect symlinks instead of symlink targets
-L  Traverse all symlinks to directories
-H  Traverse symlinks on command line only
-P  Don't traverse symlinks (default)
-c  List changed files
-v  List all files
-f  Hide errors

Work dir: /home/javan/Desktop/proteogenomics/proteogenomics-analysis-workflow/work/42/583eff44b7692b18631df0f6435763

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

yafeng commented 5 years ago

@javanOkendo can you paste your nextflow command here? I suspect the nextflow command input may be incorrect

Jokendo-collab commented 5 years ago

@yafeng see my nextflow command: ./nextflow run ipaw.nf --tdb /home/javan/Desktop/proteogenomics/VarDB.fasta --mzmls /home/javan/Desktop/project_data/*.mzML --gtf /home/javan/Desktop/proteogenomics/VarDB.gtf --mods /home/javan/Desktop/proteogenomics/Mods.txt --knownproteins /home/javan/Desktop/proteogenomics/Homo_sapiens.GRCh38.pep.all.fa --blastdb /home/javan/Desktop/proteogenomics/UniProteome+Ensembl87+refseq+GENCODE24.proteins.fasta --cosmic /home/javan/Desktop/proteogenomics/CosmicMutantExport.tsv --snpfa /home/javan/Desktop/proteogenomics/MSCanProVar_ensemblV79.filtered.fasta --genome /home/javan/Desktop/proteogenomics/hg19.chr1-22.X.Y.M.fa.masked --dbsnp /home/javan/Desktop/proteogenomics/snp142CodingDbSnp.txt --outdir tmp/ -profile testing

Jokendo-collab commented 5 years ago

./nextflow run ipaw.nf --tdb /home/javan/Desktop/proteogenomics/VarDB.fasta --mzmls /home/javan/Desktop/project_data/*.mzML --gtf /home/javan/Desktop/proteogenomics/VarDB.gtf --mods /home/javan/Desktop/proteogenomics/Mods.txt --knownproteins /home/javan/Desktop/proteogenomics/Homo_sapiens.GRCh38.pep.all.fa --blastdb /home/javan/Desktop/proteogenomics/UniProteome+Ensembl87+refseq+GENCODE24.proteins.fasta --cosmic /home/javan/Desktop/proteogenomics/CosmicMutantExport.tsv --snpfa /home/javan/Desktop/proteogenomics/MSCanProVar_ensemblV79.filtered.fasta --genome /home/javan/Desktop/proteogenomics/hg19.chr1-22.X.Y.M.fa.masked --dbsnp /home/javan/Desktop/proteogenomics/snp142CodingDbSnp.txt --outdir tmp/ -profile testing

yafeng commented 5 years ago

@javanOkendo add the \ before *.mzML and try again --mzmls /home/javan/Desktop/project_data/\*.mzML

Jokendo-collab commented 5 years ago

@yafeng Thanks for your timely assistance. I did change that and I got an error still. See below ./nextflow run ipaw.nf --tdb /home/javan/Desktop/proteogenomics/VarDB.fasta --mzmls /home/javan/Desktop/project_data/*.mzML --gtf /home/javan/Desktop/proteogenomics/VarDB.gtf --mods /home/javan/Desktop/proteogenomics/Mods.txt --knownproteins /home/javan/Desktop/proteogenomics/Homo_sapiens.GRCh38.pep.all.fa --blastdb /home/javan/Desktop/proteogenomics/UniProteome+Ensembl87+refseq+GENCODE24.proteins.fasta --cosmic /home/javan/Desktop/proteogenomics/CosmicMutantExport.tsv --snpfa /home/javan/Desktop/proteogenomics/MSCanProVar_ensemblV79.filtered.fasta --genome /home/javan/Desktop/proteogenomics/hg19.chr1-22.X.Y.M.fa.masked --dbsnp /home/javan/Desktop/proteogenomics/snp142CodingDbSnp.txt --outdir tmp/ -profile testing N E X T F L O W ~ version 18.10.1 Launching ipaw.nf [gloomy_lorenz] - revision: b59e6478ab WARN: Access to undefined parameter pisepdb -- Initialise it to a default value eg. params.pisepdb = some_value WARN: Access to undefined parameter mzmldef -- Initialise it to a default value eg. params.mzmldef = some_value Detected setnames: NA [warm up] executor > local WARN: Input tuple does not match input set cardinality declared by process splitSetNormalSearchPsms -- offending value: NA [14/97cabf] Submitted process > concatFasta (1) [fa/6ef8e6] Submitted process > makeTrypSeq [d0/748882] Submitted process > makeProtSeq [ef/41aaee] Submitted process > createSpectraLookup (1) [c3/af29cf] Submitted process > makeDecoyReverseDB (1) [2f/b4e21e] Submitted process > msgfPlus (1) [58/115fb8] Submitted process > msgfPlus (8) [d4/ba1a03] Submitted process > msgfPlus (3) [c5/d80beb] Submitted process > msgfPlus (7) [3b/fea674] Submitted process > msgfPlus (2) ERROR ~ Error executing process > 'msgfPlus (7)'

Caused by: Process msgfPlus (7) terminated with an error exit status (247)

Command executed:

fs=du -Lk concatdb.fasta|cut -f1 msgf_plus -Xmx$(($fs8/1024))M -d concatdb.fasta -s 170909_CH_C1_T025_FT.mzML -o "170909_CH_C1_T025_FT.mzid" -thread 12 -mod Mods.txt -tda 0 -t 10.0ppm -ti -1,2 -m 0 -inst 3 -e 1 -protocol 0 -ntt 2 -minLength 7 -maxLength 50 -minCharge 2 -maxCharge 6 -n 1 -addFeatures 1 msgf_plus -Xmx3500M edu.ucsd.msjava.ui.MzIDToTsv -i "170909_CH_C1_T025_FT.mzid" -o out.mzid.tsv rm concatdb.c

Command exit status: 247

Command output: MS-GF+ Release (v2016.10.26) (26 Oct 2016) Loading database files... Warning: Sequence database contains 334 counts of letter 'U', which does not correspond to an amino acid. Warning: Sequence database contains 306208 counts of letter 'X', which does not correspond to an amino acid. Warning: Sequence database contains 4 counts of letter 'Z', which does not correspond to an amino acid. Warning: Sequence database contains 2 counts of letter 'u', which does not correspond to an amino acid. Creating the suffix array indexed file... Size: 523264305 AlphabetSize: 28 Suffix creation: 0.00% complete. Suffix creation: 1.91% complete. Suffix creation: 3.82% complete. Suffix creation: 5.73% complete. Suffix creation: 7.64% complete. Suffix creation: 9.56% complete. Suffix creation: 11.47% complete. Suffix creation: 13.38% complete.

Command error: chown: unrecognized option '--from' BusyBox v1.22.1 (2014-05-23 01:24:27 UTC) multi-call binary.

Usage: chown [-RhLHPcvf]... OWNER[<.|:>[GROUP]] FILE...

Change the owner and/or group of each FILE to OWNER and/or GROUP

-R  Recurse
-h  Affect symlinks instead of symlink targets
-L  Traverse all symlinks to directories
-H  Traverse symlinks on command line only
-P  Don't traverse symlinks (default)
-c  List changed files
-v  List all files
-f  Hide errors

Work dir: /home/javan/Desktop/proteogenomics/proteogenomics-analysis-workflow/work/c5/d80beb606564679559113c5469f4ec

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

-- Check '.nextflow.log' file for details

yafeng commented 5 years ago

@javanOkendo I have never seen this error before. This maybe relate to your spectra file input . Either it is corrupted or not correctly formatted. I can't think of anything now.

File "170909_CH_C1_T053_FT.mzML", line 9017 lxml.etree.XMLSyntaxError: Extra content at the end of the document, line 9017, column 33946 chown: unrecognized option '--from' BusyBox v1.22.1 (2014-05-23 01:24:27 UTC) multi-call binary.

Jokendo-collab commented 5 years ago

@yafeng Thanks for your patience in responding to my questions. Could you show me how to handle this? Change the owner and/or group of each FILE to OWNER and/or GROUP