nf-core / rnaseq

RNA sequencing analysis pipeline using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control.
https://nf-co.re/rnaseq
MIT License
879 stars 699 forks source link

Too few assignation of fragments to transcripts in the index #1111

Closed colin893 closed 9 months ago

colin893 commented 10 months ago

Description of the bug

I am using the rnaseq pipeline to analyze some samples of single-cells. While I successively completed the analysis for 2 other experiments, it seems like a particular cell throws an error related to the number of frags : [warning] salmon was only able to assign 3 fragments to transcripts in the index, but the minimum number of required assigned fragments (--minAssignedFrags) was 10. This could be indicative of a mismatch between the reference and sample, or a very bad sample. You can change the --minAssignedFrags parameter to force salmon to quantify with fewer assigned fragments (must have at least 1).

Of course, I tried to give the parameter to the command line : --extra_salmon_quant_args "--minAssignedFrags 1"

However, still get the error and not sure of how to manage this then.

Command used and terminal output

command :

./nextflow run nf-core/rnaseq --input Samples/Exp1/sampleSheet.csv --outdir ../ProcessedData/Exp1/ --fasta ../Ref/genomer103pEXT002.fa --gtf ../Ref/genesr103pEXT002.gtf -profile docker --max_memory '60.GB' --star_index /media/zddm2021/T7/FlashSeq/genome/index/star/ --trimmer trimgalore --rsem_index /media/zddm2021/T7/FlashSeq/genome/rsem/ --salmon_index /media/zddm2021/T7/FlashSeq/genome/index/salmon/ --extra_salmon_quant_args "--minAssignedFrags 1"

logs :

Nov-11 15:31:54.865 [Task submitter] INFO  nextflow.Session - [3a/48af72] Submitted process > NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:SALMON_QUANT (22_11_15_GFP-3_G12)
Nov-11 15:31:54.866 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:SALMON_QUANT (22_11_15_GFP-3_H1); work-dir=/mnt/d1d54bcf-dec1-4d25-ae36-3647835a7fd4/FlashSeq/Scripts/work/94/7f8cea7b810e9a2a7633e2a22059c2
  error [nextflow.exception.ProcessFailedException]: Process `NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:SALMON_QUANT (22_11_15_GFP-3_H1)` terminated with an error exit status (1)
Nov-11 15:31:54.886 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:SALMON_QUANT (22_11_15_GFP-3_H1)'

Caused by:
  Process `NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:SALMON_QUANT (22_11_15_GFP-3_H1)` terminated with an error exit status (1)

Command executed:

  salmon quant \
      --geneMap genesr103pEXT002.gtf \
      --threads 6 \
      --libType=A \
      --index salmon \
      -1 22_11_15_GFP-3_H1.subsampled_R1.fastq.gz -2 22_11_15_GFP-3_H1.subsampled_R2.fastq.gz \
      --skipQuant \
      -o 22_11_15_GFP-3_H1

  if [ -f 22_11_15_GFP-3_H1/aux_info/meta_info.json ]; then
      cp 22_11_15_GFP-3_H1/aux_info/meta_info.json "22_11_15_GFP-3_H1_meta_info.json"
  fi

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:SALMON_QUANT":
      salmon: $(echo $(salmon --version) | sed -e "s/salmon //g")
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  [2023-11-11 14:31:15.016] [jointLog] [info] Usage of --validateMappings implies use of minScoreFraction. Since not explicitly specified, it is being set to 0.65
  [2023-11-11 14:31:15.016] [jointLog] [info] Setting consensusSlack to selective-alignment default of 0.35.
  [2023-11-11 14:31:15.016] [jointLog] [info] parsing read library format
  [2023-11-11 14:31:15.016] [jointLog] [info] There is 1 library.
  [2023-11-11 14:31:15.017] [jointLog] [info] Loading pufferfish index
  [2023-11-11 14:31:15.017] [jointLog] [info] Loading dense pufferfish index.
  -----------------------------------------
  | Loading contig table | Time = 25.809 s
  -----------------------------------------
  size = 24956290
  -----------------------------------------
  | Loading contig offsets | Time = 75.55 ms
  -----------------------------------------
  -----------------------------------------
  | Loading reference lengths | Time = 220.7 us
  -----------------------------------------
  -----------------------------------------
  | Loading mphf table | Time = 439.05 ms
  -----------------------------------------
  size = 1831963234
  Number of ones: 24956289
  Number of ones per inventory item: 512
  Inventory entries filled: 48743
  -----------------------------------------
  | Loading contig boundaries | Time = 6.7553 s
  -----------------------------------------
  size = 1831963234
  -----------------------------------------
  | Loading sequence | Time = 396.97 ms
  -----------------------------------------
  size = 1083274564
  -----------------------------------------
  | Loading positions | Time = 3.5343 s
  -----------------------------------------
  size = 1488550916
  -----------------------------------------
  | Loading reference sequence | Time = 316.42 ms
  -----------------------------------------
  -----------------------------------------
  | Loading reference accumulative lengths | Time = 389.27 us
  -----------------------------------------

  [2023-11-11 14:31:52.345] [jointLog] [info] done
  [2023-11-11 14:31:52.426] [jointLog] [info] Index contained 55267 targets
  [2023-11-11 14:31:52.447] [jointLog] [info] Number of decoys : 994
  [2023-11-11 14:31:52.447] [jointLog] [info] First decoy index : 54264 
  [2023-11-11 14:31:52.739] [jointLog] [warning] salmon was only able to assign 3 fragments to transcripts in the index, but the minimum number of required assigned fragments (--minAssignedFrags) was 10. This could be indicative of a mismatch between the reference and sample, or a very bad sample.  You can change the --minAssignedFrags parameter to force salmon to quantify with fewer assigned fragments (must have at least 1).

Work dir:
  /mnt/d1d54bcf-dec1-4d25-ae36-3647835a7fd4/FlashSeq/Scripts/work/94/7f8cea7b810e9a2a7633e2a22059c2

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
Nov-11 15:31:54.894 [Task monitor] INFO  nextflow.Session - Execution cancelled -- Finishing pending tasks before exit
Nov-11 15:31:54.897 [main] DEBUG nextflow.Session - Session await > all processes finished
Nov-11 15:31:54.908 [Actor Thread 19] DEBUG nextflow.file.SortFileCollector - FileCollector temp dir not removed: null
Nov-11 15:31:54.909 [Actor Thread 8] DEBUG nextflow.file.SortFileCollector - FileCollector temp dir not removed: null
Nov-11 15:31:54.908 [Actor Thread 12] DEBUG nextflow.file.SortFileCollector - FileCollector temp dir not removed: null
Nov-11 15:31:54.918 [Actor Thread 25] DEBUG nextflow.sort.BigSort - Sort completed -- entries: 7; slices: 1; internal sort time: 0.008 s; external sort time: 0.002 s; total time: 0.01 s
Nov-11 15:31:54.978 [Actor Thread 25] DEBUG nextflow.file.FileCollector - Saved collect-files list to: /mnt/d1d54bcf-dec1-4d25-ae36-3647835a7fd4/FlashSeq/Scripts/work/collect-file/ab0427e85927608f98b84fd6188e9c70
Nov-11 15:31:54.981 [Actor Thread 25] DEBUG nextflow.file.FileCollector - Deleting file collector temp dir: /tmp/nxf-6395931895396976487
Nov-11 15:32:39.914 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 176; name: NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:SALMON_QUANT (22_11_15_GFP-3_G12); status: COMPLETED; exit: 0; error: -; workDir: /mnt/d1d54bcf-dec1-4d25-ae36-3647835a7fd4/FlashSeq/Scripts/work/3a/48af7284c6bbba72f3ce1b37a8b1d8]
Nov-11 15:32:39.921 [Task monitor] DEBUG n.processor.TaskPollingMonitor - <<< barrier arrives (monitor: local) - terminating tasks monitor poll loop
Nov-11 15:32:39.921 [main] DEBUG nextflow.Session - Session await > all barriers passed
Nov-11 15:32:39.928 [main] DEBUG nextflow.util.ThreadPoolManager - Thread pool 'PublishDir' shutdown completed (hard=false)
Nov-11 15:32:39.932 [main] INFO  nextflow.Nextflow - -[nf-core/rnaseq] Pipeline completed with errors-
Nov-11 15:32:39.938 [main] DEBUG n.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=175; failedCount=1; ignoredCount=0; cachedCount=0; pendingCount=167; submittedCount=0; runningCount=0; retriesCount=0; abortedCount=0; succeedDuration=6h 6m 12s; failedDuration=4m 5s; cachedDuration=0ms;loadCpus=0; loadMemory=0; peakRunning=10; peakCpus=12; peakMemory=60 GB; ]
Nov-11 15:32:39.938 [main] DEBUG nextflow.trace.TraceFileObserver - Workflow completed -- saving trace file
Nov-11 15:32:39.941 [main] DEBUG nextflow.trace.ReportObserver - Workflow completed -- rendering execution report
Nov-11 15:32:41.077 [main] DEBUG nextflow.trace.TimelineObserver - Workflow completed -- rendering execution timeline
Nov-11 15:32:41.339 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done
Nov-11 15:32:41.381 [main] DEBUG nextflow.util.ThreadPoolManager - Thread pool 'FileTransfer' shutdown completed (hard=false)
Nov-11 15:32:41.383 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye

Relevant files

No response

System information

Nextflow version 23.10.0 Docker Exectued on local PC Linux Ubuntu nf-core/rnaseq v3.12.0-g3bec233

mahesh-panchal commented 10 months ago

Someone I know just encountered this issue too. Digging around the code, it seems that the FASTQ_SUBSAMPLE_FQ_SALMON:SALMON_QUANT process doesn't use the params.extra_salmon_quant_args and is coded to only be --skipQuant (in the file conf/modules.config). To solve this issue on your own, you can supply a custom config using the -c option to nextflow run with the following contents.

salmon_quant.config:

process {
    withName: '.*:FASTQ_SUBSAMPLE_FQ_SALMON:SALMON_QUANT' {
        ext.args   = '--skipQuant --minAssignedFrags 1'
    }
}

and then

nextflow run nf-core/rnaseq -c salmon_quant.config ...
pinin4fjords commented 9 months ago

I think we may also need to make some changes to assist on this issue, since I came across the same thing in development of the riboseq workflow. Specifically:

pinin4fjords commented 9 months ago

Fixed (I believe) in https://github.com/nf-core/rnaseq/pull/1144 and https://github.com/nf-core/rnaseq/pull/1154