lmtani / wf-human-mito

Workflow for human mitochondria analysis
https://lmtani.github.io/wf-human-mito/
Mozilla Public License 2.0
9 stars 0 forks source link

Keep getting the following error #10

Closed Pharmacogenetecist closed 2 years ago

Pharmacogenetecist commented 2 years ago

Execution cancelled -- Finishing pending tasks before exit WARN: Input tuple does not match input set cardinality declared by process separate_mitochondrion:PRINT_READS -- offending value: [file, /path/file.bam] Error executing process > 'separate_mitochondrion:PRINT_READS (1)'

Caused by: Not a valid path value type: org.codehaus.groovy.runtime.NullObject (null)

lmtani commented 2 years ago

Could you check if you're correctly calling the workflow?

It should be:

# Check input data. Could be multiple samples - one per file
❯ ls test-data/
file.bai  file.bam

# Check if you have the _Homo sapiens_ genome, with the required auxiliary files (index).
❯ ls ~/refs/Homo_sapiens_assembly38.*
/home/taniguti/refs/Homo_sapiens_assembly38.dict          /home/taniguti/refs/Homo_sapiens_assembly38.fasta.64.amb  /home/taniguti/refs/Homo_sapiens_assembly38.fasta.64.pac
/home/taniguti/refs/Homo_sapiens_assembly38.fasta         /home/taniguti/refs/Homo_sapiens_assembly38.fasta.64.ann  /home/taniguti/refs/Homo_sapiens_assembly38.fasta.64.sa
/home/taniguti/refs/Homo_sapiens_assembly38.fasta.64.alt  /home/taniguti/refs/Homo_sapiens_assembly38.fasta.64.bwt  /home/taniguti/refs/Homo_sapiens_assembly38.fasta.fai

# Call this workflow
❯ nextflow run lmtani/wf-human-mito -r main \
        --alignments "test-data/*.ba{i,m}" \
        --outdir my-out-dir \
        --reference ~/refs/Homo_sapiens_assembly38.fasta \
        -profile docker

N E X T F L O W  ~  version 22.04.3
Launching `https://github.com/lmtani/wf-human-mito` [jolly_agnesi] DSL2 - revision: 4c37618927 [main]
executor >  local (23)
[-        ] process > separate_mitochondrion:GATK4_FASTQTOSAM                 -
[-        ] process > separate_mitochondrion:BWA_ALIGN_FROM_UBAM              -
[-        ] process > separate_mitochondrion:SORT_SAM                         -
[c5/968557] process > separate_mitochondrion:PRINT_READS (1)                  [100%] 1 of 1 ✔
[82/a634b3] process > separate_mitochondrion:SELECT_MITO_READS (1)            [100%] 1 of 1 ✔
[39/8cccac] process > variant_call:CALL_DEFAULT:BWA_ALIGN_FROM_UBAM (1)       [100%] 1 of 1 ✔
[58/09cc5d] process > variant_call:CALL_DEFAULT:MARK_DUPLICATES (1)           [100%] 1 of 1 ✔
[63/1d6960] process > variant_call:CALL_DEFAULT:SORT_SAM (1)                  [100%] 1 of 1 ✔
[1d/735c61] process > variant_call:CALL_DEFAULT:COLLECT_ALIGNMENT_METRICS (1) [100%] 1 of 1 ✔
[91/36b8f6] process > variant_call:CALL_DEFAULT:COLLECT_WGS_METRICS (1)       [100%] 1 of 1 ✔
[ec/64abf0] process > variant_call:CALL_DEFAULT:CALL_MUTECT (1)               [100%] 1 of 1 ✔
[70/dda42d] process > variant_call:CALL_SHIFTED:BWA_ALIGN_FROM_UBAM (1)       [100%] 1 of 1 ✔
[77/d664b9] process > variant_call:CALL_SHIFTED:MARK_DUPLICATES (1)           [100%] 1 of 1 ✔
[49/a60223] process > variant_call:CALL_SHIFTED:SORT_SAM (1)                  [100%] 1 of 1 ✔
[cd/2317e1] process > variant_call:CALL_SHIFTED:COLLECT_ALIGNMENT_METRICS (1) [100%] 1 of 1 ✔
[40/0e0f1a] process > variant_call:CALL_SHIFTED:COLLECT_WGS_METRICS (1)       [100%] 1 of 1 ✔
[54/7e9cf0] process > variant_call:CALL_SHIFTED:CALL_MUTECT (1)               [100%] 1 of 1 ✔
[3b/9c6698] process > variant_call:LIFTOVER_VCF (1)                           [100%] 1 of 1 ✔
[3d/a11f43] process > variant_call:MERGE_VCFS (1)                             [100%] 1 of 1 ✔
[44/158ef3] process > variant_call:MERGE_STATS (1)                            [100%] 1 of 1 ✔
[31/6560fa] process > variant_call:FILTER_MUTECT_CALLS (1)                    [100%] 1 of 1 ✔
[1d/51b861] process > variant_call:LEFT_ALIGN_AND_TRIM_VARIANTS (1)           [100%] 1 of 1 ✔
[15/90218f] process > variant_call:SELECT_VARIANTS (1)                        [100%] 1 of 1 ✔
[07/a2f626] process > variant_call:HAPLOCHECK (file)                          [100%] 1 of 1 ✔
[3f/dcb9d9] process > make_report:CREATE_JSON (1)                             [100%] 1 of 1 ✔
[8e/7e8aa5] process > make_report:AGGREGATE_SAMPLES_IN_CSV                    [100%] 1 of 1 ✔
Completed at: 15-Jul-2022 06:34:39
Duration    : 1m 46s
CPU hours   : 0.2
Succeeded   : 23

Let me know if you could solve the problem.

lmtani commented 2 years ago

Maybe you're calling the workflow with --alignments "test-data/file.bam". This could cause the described error.

Pharmacogenetecist commented 2 years ago

Hello, apologies for the late response. I was getting my nextflow and docker setup correctly. When I run a single BAM file - I do not get the suggested outputs only a pipeline.svg, report.html, timeline.html and trace.txt. I do not get any VCF files.

nextflow run lmtani/wf-human-mito -r main --alignments "/DATA/WGS30x/*.ba{m,i}" --reference "/DATA/Homo_sapiens_assembly38.fasta" --outdir "/DATA/mitoflowrest/" -profile docker N E X T F L O W ~ version 22.04.5 Launching https://github.com/lmtani/wf-human-mito [happy_bose] DSL2 - revision: 4c37618927 [main] [- ] process > separate_mitochondrion:GATK4_FASTQTOSAM - [- ] process > separate_mitochondrion:BWA_ALIGN_FROM_UBAM - [- ] process > separate_mitochondrion:SORT_SAM - [- ] process > separate_mitochondrion:PRINT_READS - [- ] process > separate_mitochondrion:SELECT_MITO_READS - [- ] process > variant_call:CALL_DEFAULT:BWA_ALIGN_FROM_UBAM - [- ] process > variant_call:CALL_DEFAULT:MARK_DUPLICATES - [- ] process > variant_call:CALL_DEFAULT:SORT_SAM - [- ] process > variant_call:CALL_DEFAULT:COLLECT_ALIGNMENT_METRICS - [- ] process > variant_call:CALL_DEFAULT:COLLECT_WGS_METRICS - [- ] process > variant_call:CALL_DEFAULT:CALL_MUTECT - [- ] process > variant_call:CALL_SHIFTED:BWA_ALIGN_FROM_UBAM - [- ] process > variant_call:CALL_SHIFTED:MARK_DUPLICATES - [- ] process > variant_call:CALL_SHIFTED:SORT_SAM - [- ] process > variant_call:CALL_SHIFTED:COLLECT_ALIGNMENT_METRICS - [- ] process > variant_call:CALL_SHIFTED:COLLECT_WGS_METRICS - [- ] process > variant_call:CALL_SHIFTED:CALL_MUTECT - [- ] process > variant_call:LIFTOVER_VCF - [- ] process > variant_call:MERGE_VCFS - [- ] process > variant_call:MERGE_STATS - [- ] process > variant_call:FILTER_MUTECT_CALLS - [- ] process > variant_call:LEFT_ALIGN_AND_TRIM_VARIANTS - [- ] process > variant_call:SELECT_VARIANTS - [- ] process > variant_call:HAPLOCHECK - [- ] process > make_report:CREATE_JSON - [- ] process > make_report:AGGREGATE_SAMPLES_IN_CSV -

lmtani commented 2 years ago

Could you show me the output of the following command?

ls -lh /DATA/WGS30x/

Pharmacogenetecist commented 2 years ago

ls -lh /DATA/WGS30x/

-rw-r--r-- 1 ec2-user ec2-user 54G Jul 25 19:44 CPM00005536.bam -rw-r--r-- 1 ec2-user ec2-user 9.1M Jul 25 19:44 CPM00005536.bam.bai

lmtani commented 2 years ago

Please rename your alignment index (CPM00005536.bam.bai) to CPM00005536.bai, or simply create a symbolic link pointing to it.

# Rename
mv /DATA/WGS30x/CPM00005536.bam.bai /DATA/WGS30x/CPM00005536.bai 

## OR create a symbolic link
# ln -s /DATA/WGS30x/CPM00005536.bam.bai /DATA/WGS30x/CPM00005536.bai 

This is still a limitation. Let me know if it solves the problem.

Pharmacogenetecist commented 2 years ago

Please rename your alignment index (CPM00005536.bam.bai) to CPM00005536.bai, or simply create a symbolic link pointing to it.

# Rename
mv /DATA/WGS30x/CPM00005536.bam.bai /DATA/WGS30x/CPM00005536.bai 

## OR create a symbolic link
# ln -s /DATA/WGS30x/CPM00005536.bam.bai /DATA/WGS30x/CPM00005536.bai 

This is still a limitation. Let me know if it solves the problem.

Error executing process > 'separate_mitochondrion:SELECT_MITO_READS (1)'

Caused by: Process separate_mitochondrion:SELECT_MITO_READS (1) terminated with an error exit status (1)

Command executed:

picard RevertSam INPUT=mito.bam OUTPUT_BY_READGROUP=false OUTPUT=CPM00005536.mito.unaligned.bam VALIDATION_STRINGENCY=LENIENT ATTRIBUTE_TO_CLEAR=FT ATTRIBUTE_TO_CLEAR=CO SORT_ORDER=queryname RESTORE_ORIGINAL_QUALITIES=false

Command exit status: 1

Command output: (empty)

Command error: Unable to find image 'quay.io/biocontainers/picard:2.26.10--hdfd78af_0' locally 2.26.10--hdfd78af_0: Pulling from biocontainers/picard c1a16a04cedd: Already exists 4ca545ee6d5d: Already exists da8c0a36b1c3: Pulling fs layer da8c0a36b1c3: Download complete da8c0a36b1c3: Pull complete Digest: sha256:c57ac1a4ecce0ea1c56f73067401362695b3ced57e419d7caa0432186091705b Status: Downloaded newer image for quay.io/biocontainers/picard:2.26.10--hdfd78af_0 /usr/local/bin/picard: line 5: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8): No such file or directory INFO J2022-07-26 16:48:422022]RevertSamg as ec

** NOTE: Picard's command line syntax is changing.


** For more information, please see: ** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)


** The command line looks like this in the new syntax:


** RevertSam -INPUT mito.bam -OUTPUT_BY_READGROUP false -OUTPUT CPM00005536.mito.unaligned.bam -VALIDATION_STRINGENCY LENIENT -ATTRIBUTE_TO_CLEAR FT -ATTRIBUTE_TO_CLEAR CO -SORT_ORDER queryname -RESTORE_ORIGINAL_QUALITIES false


16:48:42.792 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/local/share/picard-2.26.10-0/picard.jar!/com/intel/gkl/native/libgkl_compression.so [Tue Jul 26 16:48:42 GMT 2022] RevertSam INPUT=mito.bam OUTPUT=CPM00005536.mito.unaligned.bam OUTPUT_BY_READGROUP=false SORT_ORDER=queryname RESTORE_ORIGINAL_QUALITIES=false ATTRIBUTE_TO_CLEAR=[NM, UQ, PG, MD, MQ, SA, MC, AS, FT, CO] VALIDATION_STRINGENCY=LENIENT RESTORE_HARDCLIPS=true OUTPUT_BY_READGROUP_FILE_FORMAT=dynamic REMOVE_DUPLICATE_INFORMATION=true REMOVE_ALIGNMENT_INFORMATION=true SANITIZE=false MAX_DISCARD_FRACTION=0.01 KEEP_FIRST_DUPLICATE=false VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false [Tue Jul 26 16:48:42 GMT 2022] Executing as ec2-user@16e162074239 on Linux 5.10.126-117.518.amzn2.x86_64 amd64; OpenJDK 64-Bit Server VM 11.0.1+13-LTS; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.26.10 WARNING 2022-07-26 16:48:42 SamFiles The index file /home/ec2-user/MITO_NF/work/67/7ded33e27b9f7ed5aaa8b73c9da214/mito.bai was found by resolving the canonical path of a symlink: mito.bam -> /home/ec2-user/MITO_NF/work/67/7ded33e27b9f7ed5aaa8b73c9da214/mito.bam [Tue Jul 26 16:48:42 GMT 2022] picard.sam.RevertSam done. Elapsed time: 0.00 minutes. Runtime.totalMemory()=536870912 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp Exception in thread "main" htsjdk.samtools.SAMException: Value for tag XQ is not a String: class java.lang.Integer at htsjdk.samtools.SAMRecord.getStringAttribute(SAMRecord.java:1342) at picard.sam.RevertSam.revertSamRecord(RevertSam.java:409) at picard.sam.RevertSam.doWork(RevertSam.java:315) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

Work dir: /home/ec2-user/MITO_NF/work/11/2ebdf8b638b9e539fc9b5d79b127f8

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

lmtani commented 2 years ago

This one is strange. Could you send me a very small subset of your alignment so I can verify what is going wrong?

I'm assuming you're using GRCh38. To subset your alignment:

samtools view -O BAM -o test-subset.bam -s 0.01 /DATA/WGS30x/CPM00005536.bam chr20 chrM

This command will sub-sample (~1%) you alignment on chr20 and mitochondrial chromosomes. The output should be less than 10mb.

Then, send me the test-subset.bam file. In order to attach it here in this issue you'll need put it into a .zip file. You can also send me the BAM file through this email, if you prefer: bgyhbw7jh@mozmail.com

Pharmacogenetecist commented 2 years ago

test-subset.txt I Uploaded it here, just change the .txt to .bam.

Thanks for your help.

lmtani commented 2 years ago

@Pharmacogenetecist : I included a new parameter, the --restore_hardclips. Could you check if it solves the reported issue?

# Update the local copy of the workflow
nextflow pull lmtani/wf-human-mito

# Run it again with the new parameter
nextflow run lmtani/wf-human-mito -r main \
        --alignments "test-data/*.ba{i,m}" \
        --outdir my-out-dir \
        --restore_hardclips false \
        --reference ~/refs/Homo_sapiens_assembly38.fasta \
        -profile docker

For more details of the issue please see #11

Pharmacogenetecist commented 2 years ago
nextflow pull lmtani/wf-human-mito

Thanks for your response - I ran your suggested code above on a new BAM file (HG38) and it worked. I got the suggested outputs. Thank you, appreciate your help.

Question- will this workflow work on low pass WGS ? I am talking like 1 - 0.5X.

Thanks.

lmtani commented 2 years ago

It depends on how many reads your experiment has aligned in the mitochondrial genome. You could run one of these low pass WGS experiment and then inspect the "all_samples.csv" file. The columns _PCT1X, ... _PCT100X informs:

The fraction of bases that attained at least 1X (5X, 10X...) sequence coverage in post-filtering bases.

Depending on your objectives, I think that if the mitochondria have 30x of coverage, it should be ok. If you intend to detect heteroplasmy, probably it's better to have more coverage. I think they usually use more than 90% of the mitochondrial sequence, with more than 70 reads aligning. i.e. _PCT70X > 0.9.

Pharmacogenetecist commented 2 years ago

It depends on how many reads your experiment has aligned in the mitochondrial genome. You could run one of these low pass WGS experiment and then inspect the "all_samples.csv" file. The columns _PCT1X, ... _PCT100X informs:

The fraction of bases that attained at least 1X (5X, 10X...) sequence coverage in post-filtering bases.

Depending on your objectives, I think that if the mitochondria have 30x of coverage, it should be ok. If you intend to detect heteroplasmy, probably it's better to have more coverage. I think they usually use more than 90% of the mitochondrial sequence, with more than 70 reads aligning. i.e. _PCT70X > 0.9.

Thank you, the MT RD varies from 40-70X, unfortunately all our LPWGS data is currently aligned against HG19. I will test this when we next do some alignment with HG19 or will your tool work with HG19 given the correct reference files? (I think I read it only works with HG38).

lmtani commented 2 years ago

With some more changes, this workflow should also work with HG19. I'll check it and let you know.

Pharmacogenetecist commented 2 years ago

With some more changes, this workflow should also work with HG19. I'll check it and let you know.

Awesome! Thank you.

lmtani commented 2 years ago

Hello again @Pharmacogenetecist : now I think you can use your input alignments in HG19. Could you try it? You just need to provide the reference you used in your alignment through --reference parameter.

I'll close this issue. Please open another one if you have more problems.