nf-core / airrflow

B-cell and T-cell Adaptive Immune Receptor Repertoire (AIRR) sequencing analysis pipeline using the Immcantation framework
https://nf-co.re/airrflow
MIT License
54 stars 34 forks source link

Missing output file(s) `*_R1_primers-pass.fastq` #285

Closed LiuH2020 closed 1 year ago

LiuH2020 commented 1 year ago

Description of the bug

I've downloaded the test_full (metadata_pcr_umi_airr_300.tsv) and dataset fastq data local. I ran the pipeline using the docker profile with some error:

executor >  local (44)
[5c/19ea6d] process > NFCORE_AIRRFLOW:AIRRFLOW:SEQUENCE_ASSEMBLY:FASTQ_INPUT_CHECK:SAMPLE... [100%] 1 of 1 ✔
[39/58566f] process > NFCORE_AIRRFLOW:AIRRFLOW:SEQUENCE_ASSEMBLY:PRESTO_UMI:FASTP (SRR138... [100%] 10 of 10 ✔
[f2/a5317f] process > NFCORE_AIRRFLOW:AIRRFLOW:SEQUENCE_ASSEMBLY:PRESTO_UMI:RENAME_FASTQ_... [100%] 10 of 10 ✔
[79/0a6827] process > NFCORE_AIRRFLOW:AIRRFLOW:SEQUENCE_ASSEMBLY:PRESTO_UMI:GUNZIP_UMI (S... [100%] 10 of 10 ✔
[de/8b23b1] process > NFCORE_AIRRFLOW:AIRRFLOW:SEQUENCE_ASSEMBLY:PRESTO_UMI:PRESTO_FILTER... [100%] 9 of 9
[68/4abb90] process > NFCORE_AIRRFLOW:AIRRFLOW:SEQUENCE_ASSEMBLY:PRESTO_UMI:PRESTO_MASKPR... [ 11%] 1 of 9, failed: 1
[-        ] process > NFCORE_AIRRFLOW:AIRRFLOW:SEQUENCE_ASSEMBLY:PRESTO_UMI:PRESTO_PAIRSE... -
[-        ] process > NFCORE_AIRRFLOW:AIRRFLOW:SEQUENCE_ASSEMBLY:PRESTO_UMI:PRESTO_CLUSTE... -
[-        ] process > NFCORE_AIRRFLOW:AIRRFLOW:SEQUENCE_ASSEMBLY:PRESTO_UMI:PRESTO_PARSE_... -
[-        ] process > NFCORE_AIRRFLOW:AIRRFLOW:SEQUENCE_ASSEMBLY:PRESTO_UMI:PRESTO_BUILDC... -
[-        ] process > NFCORE_AIRRFLOW:AIRRFLOW:SEQUENCE_ASSEMBLY:PRESTO_UMI:PRESTO_POSTCO... -
[-        ] process > NFCORE_AIRRFLOW:AIRRFLOW:SEQUENCE_ASSEMBLY:PRESTO_UMI:PRESTO_ASSEMB... -
[-        ] process > NFCORE_AIRRFLOW:AIRRFLOW:SEQUENCE_ASSEMBLY:PRESTO_UMI:FASTQC_POSTAS... -
[-        ] process > NFCORE_AIRRFLOW:AIRRFLOW:SEQUENCE_ASSEMBLY:PRESTO_UMI:PRESTO_PARSEH... -
[-        ] process > NFCORE_AIRRFLOW:AIRRFLOW:SEQUENCE_ASSEMBLY:PRESTO_UMI:PRESTO_PARSEH... -
[-        ] process > NFCORE_AIRRFLOW:AIRRFLOW:SEQUENCE_ASSEMBLY:PRESTO_UMI:PRESTO_PARSEH... -
[-        ] process > NFCORE_AIRRFLOW:AIRRFLOW:SEQUENCE_ASSEMBLY:PRESTO_UMI:PRESTO_COLLAP... -
[-        ] process > NFCORE_AIRRFLOW:AIRRFLOW:SEQUENCE_ASSEMBLY:PRESTO_UMI:PRESTO_SPLITS... -
[47/0b61c1] process > NFCORE_AIRRFLOW:AIRRFLOW:VDJ_ANNOTATION:UNZIP_IGBLAST (unzip_db)       [100%] 1 of 1 ✔
[ec/d25114] process > NFCORE_AIRRFLOW:AIRRFLOW:VDJ_ANNOTATION:UNZIP_IMGT (unzip_db)          [100%] 1 of 1 ✔
[-        ] process > NFCORE_AIRRFLOW:AIRRFLOW:VDJ_ANNOTATION:CHANGEO_ASSIGNGENES            -
[-        ] process > NFCORE_AIRRFLOW:AIRRFLOW:VDJ_ANNOTATION:CHANGEO_MAKEDB                 -
[-        ] process > NFCORE_AIRRFLOW:AIRRFLOW:VDJ_ANNOTATION:FILTER_QUALITY                 -
[-        ] process > NFCORE_AIRRFLOW:AIRRFLOW:VDJ_ANNOTATION:CHANGEO_PARSEDB_SPLIT          -
[-        ] process > NFCORE_AIRRFLOW:AIRRFLOW:VDJ_ANNOTATION:FILTER_JUNCTION_MOD3           -
[-        ] process > NFCORE_AIRRFLOW:AIRRFLOW:VDJ_ANNOTATION:ADD_META_TO_TAB                -
[-        ] process > NFCORE_AIRRFLOW:AIRRFLOW:BULK_QC_AND_FILTER:COLLAPSE_DUPLICATES        -
[-        ] process > NFCORE_AIRRFLOW:AIRRFLOW:SINGLE_CELL_QC_AND_FILTERING:SINGLE_CELL_QC   -
[-        ] process > NFCORE_AIRRFLOW:AIRRFLOW:CLONAL_ANALYSIS:FIND_CLONAL_THRESHOLD         -
[-        ] process > NFCORE_AIRRFLOW:AIRRFLOW:CLONAL_ANALYSIS:DEFINE_CLONES_COMPUTE         -
[-        ] process > NFCORE_AIRRFLOW:AIRRFLOW:CLONAL_ANALYSIS:DEFINE_CLONES_REPORT          -
[-        ] process > NFCORE_AIRRFLOW:AIRRFLOW:CLONAL_ANALYSIS:DOWSER_LINEAGES               -
[-        ] process > NFCORE_AIRRFLOW:AIRRFLOW:REPERTOIRE_ANALYSIS_REPORTING:PARSE_LOGS      -
[-        ] process > NFCORE_AIRRFLOW:AIRRFLOW:REPERTOIRE_ANALYSIS_REPORTING:REPORT_FILE_... -
[-        ] process > NFCORE_AIRRFLOW:AIRRFLOW:REPERTOIRE_ANALYSIS_REPORTING:AIRRFLOW_REPORT -
[-        ] process > NFCORE_AIRRFLOW:AIRRFLOW:CUSTOM_DUMPSOFTWAREVERSIONS                   -
[-        ] process > NFCORE_AIRRFLOW:AIRRFLOW:MULTIQC                                       -
Execution cancelled -- Finishing pending tasks before exit
ERROR ~ Error executing process > 'NFCORE_AIRRFLOW:AIRRFLOW:SEQUENCE_ASSEMBLY:PRESTO_UMI:PRESTO_MASKPRIMERS_UMI (SRR1383466)'

Caused by:
  Missing output file(s) `*_R1_primers-pass.fastq` expected by process `NFCORE_AIRRFLOW:AIRRFLOW:SEQUENCE_ASSEMBLY:PRESTO_UMI:PRESTO_MASKPRIMERS_UMI (SRR1383466)`

Command executed:

  MaskPrimers.py score --nproc 4 -s SRR1383466_R1_quality-pass.fastq -p C_primers.fasta --start 15 --barcode  --maxerror 0.2 --mode cut --outname SRR1383466_R1 --log SRR1383466_R1.log > SRR1383466_command_log.txt
  MaskPrimers.py score --nproc 4 -s SRR1383466_R2_quality-pass.fastq -p V_primers.fasta --start 0  --maxerror 0.2 --mode cut --outname SRR1383466_R2 --log SRR1383466_R2.log >> SRR1383466_command_log.txt
  ParseLog.py -l SRR1383466_R1.log SRR1383466_R2.log -f ID PRIMER ERROR

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_AIRRFLOW:AIRRFLOW:SEQUENCE_ASSEMBLY:PRESTO_UMI:PRESTO_MASKPRIMERS_UMI":
      presto: $( MaskPrimers.py --version | awk -F' '  '{print $2}' )
  END_VERSIONS

Command exit status:
  0

Command output:
  START> ParseLog
   FILE> SRR1383466_R1.log

  PROGRESS> 08:30:50 (0) 0.0 min
  PROGRESS> 08:30:51 (100000) 0.0 min
  PROGRESS> 08:30:52 (185179) 0.0 min

   OUTPUT> SRR1383466_R1_table.tab
  RECORDS> 185179
     PASS> 185179
     FAIL> 0
      END> ParseLog

  START> ParseLog
   FILE> SRR1383466_R2.log

  PROGRESS> 08:30:52 (0) 0.0 min
  PROGRESS> 08:30:52 (7264) 0.0 min

   OUTPUT> SRR1383466_R2_table.tab
  RECORDS> 7264
     PASS> 7264
     FAIL> 0
      END> ParseLog

Command error:
  WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.

Work dir:
  /home/liuhui/nf-core/airrflow-test/work/68/4abb90a9624979c9e1c4dc06628bb5

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details
Automatic clone_threshold is 'NA'. Consider setting params.threshold manually.
ERROR ~ Not a valid file collector argument [java.util.ArrayList]: []

 -- Check '.nextflow.log' file for details

It seems to cause by the PRESTO_MASKPRIMERS_UMI modules without the 'primers-pass.fastq' output. Could you give me some advice?

Command used and terminal output

nextflow run ../airrflow -profile docker --outdir result --input input/metadata_pcr_umi_airr_300.tsv --cprimers input/C_primers.fasta --vprimers input/V_primers.fasta --imgtdb_base input/imgtdb_base.zip --igblast_base input/igblast_base.zip --library_generation_method specific_pcr_umi --cprimer_position R1 --umi_length 15 --umi_start 0 --umi_position R1 --max_memory 56.GB --max_cpus 4

Relevant files

No response

System information

nextflow -version (23.10.0) Executor (local) Container engine: (Docker) OS (Ubuntu 20.04.6) Version of nf-core/airrflow (v3.1.0)

ggabernet commented 1 year ago

Hi @LiuH2020 , thanks for trying out the pipeline. If you've used the full pipeline tests, it should work and I think this should be a transient error maybe. You also provided the Vprimers and Cprimers that are provided in the test_full.config, correct ? Have you tried resuming the run by running again the same command in the same directory but providing the extra option -resume ?

Otherwise I would just have the suggestion as well to run the pipeline directly from GitHub providing the version number with the parameter -r. As there will be the release 3.2.0 really soon, I would also suggest trying the dev version of the pipeline:

nextflow run nf-core/airrflow -r dev -profile docker,test_full --outdir result --max_memory 56.GB --max_cpus 4

Hope this works for you, let me know if you have any remaining errors that we can fix before the next release

LiuH2020 commented 1 year ago

Thanks for your reply. According to your suggestions, I checked the Vprimers and Cprimers files. And found the used Vprimers and Cprimers is not right (download from the the airrflow branches of nf-core/test-datasets). The pipeline is OK when replacing them with downloaded from s3://ngi-igenomes/test-data/airrflow/pcr_umi/vprimers.fasta and s3://ngi-igenomes/test-data/airrflow/pcr_umi/cprimers.fasta.

For the Vprimers and Cprimers file, I have some questions: (1)The Vprimers and Cprimers files of airrflow branches of nf-core/test-datasets of nf-core/test-datasets) is fake? Not use them? (2) The Vprimers and Cprimers files of aws is real primers constructed based on sequences? (3) For some public dataset, if the primers they use cannot accessible, could these Vprimers and Cprimers of pipeline be used? or, needn't to input these primers file?

ggabernet commented 1 year ago

Great you could solve the issue with using the right primers! The primers in the test-datasets repository are for small test data, which uses another dataset. The primers really depend on the protocol that was used for sequencing, e.g. if the amplification of BCR / TCR was done with multiplexed PCRs or 5' RACE amplification. Maybe they describe whether they used a commercial kit for amplification?

I will close this issue meanwhile as the problem was solved, but feel free to continue asking questions regarding the primers. We also have a Slack channel on the nf-core slack for this kind of discussions: https://nf-co.re/join