nf-core / proteomicslfq

Proteomics label-free quantification (LFQ) analysis pipeline
https://nf-co.re/proteomicslfq
MIT License
33 stars 19 forks source link

Unable to read sdrf #111

Closed ncarrut closed 4 years ago

ncarrut commented 4 years ago

Hi, I've run the pipeline successfully with test and with the pxd001819 dataset but can't get it to run with my own data. The data are being supplied locally and I'm not sure I've specified the parameters right for that. My command is:

../nextflow/nextflow run nf-core/proteomicslfq -r dev -profile singularity --input /wsu/home/aj/aj76/aj7682/2047/raw/*.raw --expdesign /wsu/home/aj/aj76/aj7682/2047/sdrf.tsv --database /wsu/home/aj/aj76/aj7682/2047/UP000005640.fasta --add_decoys --precursor_mass_tolerance 20 --fragment_mass_tolerance 0.5

My sdrf file is here sdrf.txt. It's valid according to the sdrf_parser. The pipeline launches successfully and makes the decoy database. But I notice that the spectra processing steps indicate 1 of 1 completed where there are 34 raw files so it should be 1 of 34 etc. The error message is below. When I look at proteomicslfq.log I see; Error: Unable to read file (Error: Missing column header: Fraction in: sdrf.tsv). Can anyone help me troubleshoot this please? Thanks, Nick

----------------------------------------------------
Pipeline Release  : dev
Run Name          : insane_rubens
Max Resources     : 128 GB memory, 16 cpus, 10d time per job
Container         : singularity - nfcore/proteomicslfq:1.0.0
Output dir        : ./results
Launch dir        : /wsu/home/aj/aj76/aj7682/2047
Working dir       : /wsu/home/aj/aj76/aj7682/2047/work
Script dir        : /wsu/home/aj/aj76/aj7682/.nextflow/assets/nf-core/proteomicslfq
User              : aj7682
Config Profile    : singularity
Config Files      : /wsu/home/aj/aj76/aj7682/.nextflow/assets/nf-core/proteomicslfq/nextflow.config, /wsu/home/aj/aj76/aj7682/2047/nextflow.config
----------------------------------------------------

 ...

[a1/e8a6c6] process > raw_file_conversion (1)        [100%] 1 of 1 ✔
[-        ] process > mzml_indexing                  -
[c5/bf7df4] process > generate_decoy_database (1)    [100%] 1 of 1 ✔
[-        ] process > openms_peakpicker              -
[-        ] process > search_engine_msgf             -
[fa/f1dedf] process > search_engine_comet (1)        [100%] 1 of 1 ✔
[7d/4a8955] process > index_peptides (1)             [100%] 1 of 1 ✔
[e2/09cd3f] process > extract_percolator_features... [100%] 1 of 1 ✔
[34/cb39aa] process > percolator (1)                 [100%] 1 of 1 ✔
[-        ] process > fdr_idpep                      -
[-        ] process > idpep                          -
[a2/e78c2b] process > idscoreswitcher_to_qval (1)    [100%] 1 of 1 ✔
[-        ] process > consensusid                    -
[-        ] process > fdr_consensusid                -
[1f/81f7b9] process > idfilter (1)                   [100%] 1 of 1 ✔
[-        ] process > idscoreswitcher_for_luciphor   -
[-        ] process > luciphor                       -
[89/06b05e] process > proteomicslfq (1)              [100%] 1 of 1, failed: 1 ✘
[-        ] process > msstats                        -
[-        ] process > ptxqc                          -
[4e/cc86b2] process > get_software_versions          [100%] 1 of 1 ✔
[4a/9aa08f] process > output_documentation           [100%] 1 of 1 ✔
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/proteomicslfq] Pipeline completed with errors-
Error executing process > 'proteomicslfq (1)'

Caused by:
  Process `proteomicslfq (1)` terminated with an error exit status (3)

Command executed:

  ProteomicsLFQ -in 2047_fin_01.mzML \
                -ids 2047_fin_01_comet_idx_feat_perc_switched_filter.idXML \
                -design sdrf.tsv \
                -fasta UP000005640_decoy.fasta \
                -protein_inference aggregation \
                -quantification_method feature_intensity \
                -targeted_only true \
                -mass_recalibration false \
                -transfer_ids false \
                -protein_quantification unique_peptides \
                -out out.mzTab \
                -threads 12 \
                -out_msstats out.csv \
                -out_cxml out.consensusXML \
                -proteinFDR 0.05 \
                -debug 0 \
                > proteomicslfq.log

Command exit status:
  3

Command output:
  (empty)

Work dir:
  /wsu/home/aj/aj76/aj7682/2047/work/89/06b05e2355440facd9ded1d69e7776

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
jpfeuffer commented 4 years ago

Hey Nick,

ah yes, this could be slightly confusing: The two ways of specifying an experimental design are mutually exclusive.

Either: --input your_sdrf.tsv

plus (1) optionally a --root_folder in case the spectrum files are lying in a different location than in the SDRF. Plus (2) optionally --local_input_type (in case you converted already, and the files are in mzML, as opposed to raw in the SDRF). You should not specify an additional experimental design here since both the filenames and the design are already included in the SDRF. (I will open an issue that the pipeline warns and fails from the beginning in the future). Instead, the "main" --input parameter needs to be the sdrf.

Or: --input run*_foo*.mzML/raw plus a then mandatory experimental design (--expdesign) file in OpenMS format.

I think on the website the parameter documentation is more clearly structured than on the command line.

TLDR: Use --input /wsu/home/aj/aj76/aj7682/2047/sdrf.tsv --root_folder /wsu/home/aj/aj76/aj7682/2047/raw/ instead. You might not even need the --root_folder if the filenames in the SDRF are local and absolute. It was introduced for SDRFs with e.g. FTP links.

ncarrut commented 4 years ago

Thanks that worked.