nf-core / raredisease

Call and score variants from WGS/WES of rare disease patients.
https://nf-co.re/raredisease
MIT License
82 stars 34 forks source link

manta path error #480

Open egenomics opened 8 months ago

egenomics commented 8 months ago

Description of the bug

Manta fails with a path error:

  error [java.lang.InterruptedException]: java.lang.InterruptedException
Jan-16 03:00:07.184 [Actor Thread 16] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_RAREDISEASE:RAREDISEASE:CALL_STRUCTURAL_VARIANTS:CALL_SV_MANTA:MANTA (1)'

Caused by:
  Path value cannot be null

Command used and terminal output

nextflow run /playground/nf-core-raredisease_1.1.1/1_1_1/      -profile singularity     --input /playground/dataset_cnvs/samplesheet.csv     --skip_snv_annotation     --skip_sv_annotation     --skip_cnv_calling     --outdir /playground/dataset_cnvs/results     --intervals_wgs /playground/dataset_cnvs/Illumina-truseq-rapid-exome_v1.2_hg38_target_orig.interval_list     --intervals_y /playground/dataset_cnvs/Illumina-truseq-rapid-exome_v1.2_hg38_target_orig.interval_list     --skip_mt_analysis     --skip_vep_filter     --target_bed /playground/dataset_cnvs/Illumina-truseq-rapid-exome_v1.2_hg38_target_orig.bed     --variant_catalog /playground/dataset_cnvs/variant_catalog.json

Relevant files

nextflow.log

System information

nextflow version 23.10.0.5889 Workstation local (singularity) Ubuntu 22.04.03 LTS raredisease 1.1.1

egenomics commented 8 months ago

-resume [name_run] also doesn't seem to work, as it does not recover the bwa-mem2 or fastqc processes, that were finished.

jemten commented 8 months ago

Hi @egenomics, I've also noticed that -resume and I'm currently looking into that. Are these exome samples?

egenomics commented 8 months ago

yes, they are exomes (it's been modified in the nextflow.config, analysis_type = 'wes'). I managed to have -resume working but it gets the same error: path value cannot be null error in manta :(

egenomics commented 8 months ago

Any idea where the error might be?

jemten commented 7 months ago

Hmm something seems to be off when combining the channels for going from a sample to case. From looking at the logs it looks like the samples aren't related in any way. Is that correct?

egenomics commented 7 months ago

Yes, in theory they are not related. Should I process every "case" separately in a difference nextflow run - samplesheet instance?

jemten commented 7 months ago

I think it's worth trying to do that while we are trying to nail down why the first setup doesn't work. Also, if possible, could you share your samplesheet?

egenomics commented 7 months ago

I will try to do it separately, but I need to process ~120 exomes with certain urgency.

This is one of the samplesheets

cat samplesheet_R1080.csv 
sample,lane,fastq_1,fastq_2,sex,phenotype,paternal_id,maternal_id,case_id
200851803,1,/playground/fastq/R1080/200851803_S1_R1_001.fastq.gz,/playground/fastq/R1080/200851803_S1_R2_001.fastq.gz,other,2,0,0,200851803
192631319,1,/playground/fastq/R1080/192631319_S2_R1_001.fastq.gz,/playground/fastq/R1080/192631319_S2_R2_001.fastq.gz,2,2,0,0,192631319
192629071,1,/playground/fastq/R1080/192629071_S3_R1_001.fastq.gz,/playground/fastq/R1080/192629071_S3_R2_001.fastq.gz,1,2,0,0,192629071
192629059,1,/playground/fastq/R1080/192629059_S4_R1_001.fastq.gz,/playground/fastq/R1080/192629059_S4_R2_001.fastq.gz,2,2,0,0,192629059
192637663,1,/playground/fastq/R1080/192637663_S5_R1_001.fastq.gz,/playground/fastq/R1080/192637663_S5_R2_001.fastq.gz,1,2,0,0,192637663
192606791,1,/playground/fastq/R1080/192606791_S6_R1_001.fastq.gz,/playground/fastq/R1080/192606791_S6_R2_001.fastq.gz,2,2,0,0,192606791
192641178,1,/playground/fastq/R1080/192641178_S7_R1_001.fastq.gz,/playground/fastq/R1080/192641178_S7_R2_001.fastq.gz,2,2,0,0,192641178
192610990,1,/playground/fastq/R1080/192610990_S8_R1_001.fastq.gz,/playground/fastq/R1080/192610990_S8_R2_001.fastq.gz,1,2,0,0,192610990
192643630,1,/playground/fastq/R1080/192643630_S9_R1_001.fastq.gz,/playground/fastq/R1080/192643630_S9_R2_001.fastq.gz,1,2,0,0,192643630
190846474,1,/playground/fastq/R1080/190846474_S10_R1_001.fastq.gz,/playground/fastq/R1080/190846474_S10_R2_001.fastq.gz,1,2,0,0,190846474
190846477,1,/playground/fastq/R1080/190846477_S11_R1_001.fastq.gz,/playground/fastq/R1080/190846477_S11_R2_001.fastq.gz,1,2,0,0,190846477
jemten commented 7 months ago

Try to start one and see if the error persists. The pipeline should be compatible with your setup, and otherwise we'll try to fix it. We are working to get a new release out in a couple of week so it would be good to get a grip on the issue.

egenomics commented 7 months ago

Hi, I get a different error when processing samples one by one (error at the end).

The command is this one

nextflow run /playground/nf-core-raredisease_1.1.1/1_1_1/ -profile singularity \
--input /playground/dataset_cnvs/samplesheets_raredisease/samplesheet_$sample.csv --skip_snv_annotation --skip_sv_annotation --skip_cnv_calling \
--outdir /playground/dataset_cnvs/results/$sample --intervals_wgs /playground/dataset_cnvs/Illumina-truseq-rapid-exome_v1.2_hg38_target_orig.interval_list \
 --intervals_y /playground/dataset_cnvs/Illumina-truseq-rapid-exome_v1.2_hg38_target_orig.interval_list --skip_mt_analysis --skip_vep_filter \
 --target_bed /playground/dataset_cnvs/Illumina-truseq-rapid-exome_v1.2_hg38_target_orig.bed \
 --variant_catalog /playground/dataset_cnvs/variant_catalog.json \
 --analysis_type wes

ERROR ~ Error executing process > 'NFCORE_RAREDISEASE:RAREDISEASE:PREPARE_REFERENCES:GATK_BILT (playground)'

Caused by:
  Process `NFCORE_RAREDISEASE:RAREDISEASE:PREPARE_REFERENCES:GATK_BILT (playground)` terminated with an error exit status (3)

Command executed:

  gatk --java-options "-Xmx24576M" BedToIntervalList \
      --INPUT Illumina-truseq-rapid-exome_v1.2_hg38_target_orig.bed \
      --OUTPUT playground_target.interval_list \
      --SEQUENCE_DICTIONARY genome.dict \
      --TMP_DIR . \

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_RAREDISEASE:RAREDISEASE:PREPARE_REFERENCES:GATK_BILT":
      gatk4: $(echo $(gatk --version 2>&1) | sed 's/^.*(GATK) v//; s/ .*$//')
  END_VERSIONS

Command exit status:
  3

Command output:
  (empty)

Command error:
  Using GATK jar /usr/local/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar
  Running:
      java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx24576M -jar /usr/local/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar BedToIntervalList --INPUT Illumina-truseq-rapid-exome_v1.2_hg38_target_orig.bed --OUTPUT playground_target.interval_list --SEQUENCE_DICTIONARY genome.dict --TMP_DIR .
  08:58:38.845 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/local/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
  [Fri Jan 26 08:58:38 GMT 2024] BedToIntervalList --INPUT Illumina-truseq-rapid-exome_v1.2_hg38_target_orig.bed --SEQUENCE_DICTIONARY genome.dict --OUTPUT playground_target.interval_list --TMP_DIR . --SORT true --UNIQUE false --DROP_MISSING_CONTIGS false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
  [Fri Jan 26 08:58:38 GMT 2024] Executing as jlvillanueva@HCP14035 on Linux 6.5.0-14-generic amd64; OpenJDK 64-Bit Server VM 17.0.3-internal+0-adhoc..src; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.4.0.0
  [Fri Jan 26 08:58:39 GMT 2024] picard.util.BedToIntervalList done. Elapsed time: 0.00 minutes.
  Runtime.totalMemory()=285212672
  To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
  picard.PicardException: Start on sequence 'chr4' was past the end: 112788531 < 112904435
    at picard.util.BedToIntervalList.doWork(BedToIntervalList.java:160)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:289)
    at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:37)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)

Work dir:
  /playground/work/66/42850f2837916a14b7ce930937ae25

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details
ramprasadn commented 7 months ago

This looks like an issue with your bed file, where the start location is greater than the end location for chr4. Could you check if that is the case?

egenomics commented 6 months ago

Hi, The coordinates where the pipeline complains are right in the middle of chromosome 4.

This is the bed file: [...] chr4 112654064 112654163 chr4 112657238 112657337 chr4 112904434 112904592 <<--------------- chr4 113049678 113049863 chr4 113106860 113106959 [...]

image

ramprasadn commented 6 months ago

hmmm.. Would it be possible for you to share your bed file and your dictionary file?

egenomics commented 6 months ago

Here Illumina-truseq-rapid-exome_v1.2_hg38_target_orig.zip