nf-core / hlatyping

Precision HLA typing from next-generation sequencing data
https://nf-co.re/hlatyping
MIT License
61 stars 30 forks source link

Cannot run HLATyping with user-specific BAM file #74

Closed NTNguyen13 closed 4 years ago

NTNguyen13 commented 4 years ago

Hi, I have installed HLAtyping with nextflow, I can run nextflow run nf-core/hlatyping -profile docker,test --outdir $PWD/results However, I want to use my own bam file as input, so I used this command (revised from https://github.com/nf-core/hlatyping/issues/70)

./nextflow run nf-core/hlatyping -profile docker --bam '/home/thanh/IGSR_Project/1000GVN_aln/VN_01_00_0089_01_01.bam' --genome GRCh38 -c igenomes.config --outdir /home/thanh/IGSR_Project/1000GVN_result/VN_01_00_0089_01_01/hla/

However I got this error:

N E X T F L O W  ~  version 20.01.0
Launching `nf-core/hlatyping` [tender_tuckerman] - revision: bf5d0c2d46 [master]
Unable to parse config file: '/home/thanh/igenomes.config'

  Compile failed for sources FixedSetSources[name='/groovy/script/Script4943CF089126BD872646A55E3C8F3819/_nf_config_c9b0ddec']. Cause: org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
  /groovy/script/Script4943CF089126BD872646A55E3C8F3819/_nf_config_c9b0ddec: 7: unexpected token: < @ line 7, column 1.
     <!DOCTYPE html>
     ^

  1 error

Any help is appreciated

christopher-mohr commented 4 years ago

Hi @NTNguyen13,

did you try running it without --genome GRCh38and without -c genomes.config ?

NTNguyen13 commented 4 years ago

yes, when I run it without those 2 parameters, I got this error:


N E X T F L O W  ~  version 20.01.0
Launching `nf-core/hlatyping` [spontaneous_stone] - revision: bf5d0c2d46 [master]
WARN: The access of `config` object is deprecated
WARN: Access to undefined parameter `genome` -- Initialise it to a default value eg. `params.genome = some_value`
WARN: Access to undefined parameter `fasta` -- Initialise it to a default value eg. `params.fasta = some_value`
BAM file format detected. Initiate remapping to HLA alleles with yara mapper.
[2m----------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/hlatyping v1.1.5
----------------------------------------------------
Cannot find any bam file matching: data/test*{1,2}.fq.gz
NB: Path needsto be enclosed in quotes!
Pipeline Release  : master
Run Name          : spontaneous_stone
File Type         : BAM
Seq Type          : dna
Index Location    : /home/thanh/.nextflow/assets/nf-core/hlatyping/data/indices/yara/hla_reference_dna
IP solver         : glpk
Enumerations      : 1
Beta              : 0.009
Prefix            : hla_run
Max Memory        : 128 GB
Max CPUs          : 16
Max Time          : 10d
Output dir        : /home/thanh/IGSR_Project/1000GVN_result/VN_01_00_0089_01_01/hla/
Working dir       : /home/thanh/work
Reads             : data/test*{1,2}.fq.gz
Fasta Ref         : null
Max Resources     : 128 GB memory, 16 cpus, 10d time per job
Container         : docker - nfcore/hlatyping:1.1.5
Launch dir        : /home/thanh
Script dir        : /home/thanh/.nextflow/assets/nf-core/hlatyping
User              : thanh
Config Profile    : docker
[2m----------------------------------------------------
christopher-mohr commented 4 years ago

Could you post the command you used this time? The lines at the beginning are just warnings because of the undefined parameters.

NTNguyen13 commented 4 years ago

./nextflow run nf-core/hlatyping -profile docker --bam '/home/thanh/IGSR_Project/1000GVN_aln/VN_01_00_0089_01_01.bam' --outdir /home/thanh/IGSR_Project/1000GVN_result/VN_01_00_0089_01_01/hla/

this is the command that I used, I only removed those 2 parameters

ggabernet commented 4 years ago

Hi @NTNguyen13 , thanks for opening the issue!

Could you post the full error from the second time you run it? It does not seem to be complete, here there is no error, just warnings.

NTNguyen13 commented 4 years ago

Hi @ggabernet , here is it:

Command: ./nextflow run nf-core/hlatyping -profile docker --bam "/home/thanh/IGSR_Project/1000GVN_aln/VN_01_00_0089_01_01.bam" --outdir /home/thanh/IGSR_Project/1000GVN_result/VN_01_00_0089_01_01/hla/

Output:

N E X T F L O W  ~  version 20.01.0
Launching `nf-core/hlatyping` [nostalgic_bardeen] - revision: bf5d0c2d46 [master]
WARN: The access of `config` object is deprecated
WARN: Access to undefined parameter `genome` -- Initialise it to a default value eg. `params.genome = some_value`
WARN: Access to undefined parameter `fasta` -- Initialise it to a default value eg. `params.fasta = some_value`
BAM file format detected. Initiate remapping to HLA alleles with yara mapper.
[2m----------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/hlatyping v1.1.5
----------------------------------------------------
Cannot find any bam file matching: data/test*{1,2}.fq.gz
NB: Path needsto be enclosed in quotes!
Pipeline Release  : master
Run Name          : nostalgic_bardeen
File Type         : BAM
Seq Type          : dna
Index Location    : /home/thanh/.nextflow/assets/nf-core/hlatyping/data/indices/yara/hla_reference_dna
IP solver         : glpk
Enumerations      : 1
Beta              : 0.009
Prefix            : hla_run
Max Memory        : 128 GB
Max CPUs          : 16
Max Time          : 10d
Output dir        : /home/thanh/IGSR_Project/1000GVN_result/VN_01_00_0089_01_01/hla/
Working dir       : /home/thanh/work
Reads             : data/test*{1,2}.fq.gz
Fasta Ref         : null
Max Resources     : 128 GB memory, 16 cpus, 10d time per job
Container         : docker - nfcore/hlatyping:1.1.5
Launch dir        : /home/thanh
Script dir        : /home/thanh/.nextflow/assets/nf-core/hlatyping
User              : thanh
Config Profile    : docker
[2m----------------------------------------------------

I have checked the output folder, there's only a folder named pipeline_info, with 1 file execution_trace.txt, content: task_id hash native_id name status exit submit duration realtime %cpu peak_rss peak_vmem rchar wchar

christopher-mohr commented 4 years ago

Sorry @NTNguyen13 I just realised that the parameters are not used correctly.

The parameter --bam is a boolean parameter. Please specify the bam file using the reads parameter (--readPaths) and specify --bam additionally. If you have single end data you will need this parameter as well (--singleEnd).

NTNguyen13 commented 4 years ago

Hi, I have edited my command into ./nextflow run nf-core/hlatyping -profile docker --readPaths "/home/thanh/IGSR_Project/1000GVN_aln/VN_01_00_0089_01_01.bam" --bam --outdir /home/thanh/IGSR_Project/1000GVN_result/VN_01_00_0089_01_01/hla/

This time I got the output:

BAM file format detected. Initiate remapping to HLA alleles with yara mapper.
[f3/a2888f] process > remap_to_hla          [100%] 1 of 1, failed: 1
[23/ade82d] process > make_ot_config        [100%] 1 of 1, failed: 1
[-        ] process > run_optitype          -
[6a/22117f] process > output_documentation  [100%] 1 of 1, failed: 1
[ed/ede44f] process > get_software_versions [100%] 1 of 1, failed: 1
[-        ] process > multiqc               -
Execution cancelled -- Finishing pending tasks before exit
Error executing process > 'output_documentation (1)'

Caused by:
  Process requirement exceed available memory -- req: 8 GB; avail: 7.6 GB

Command executed:

  markdown_to_html.r output.md results_description.html

Command exit status:
  -

Command output:
  (empty)

Work dir:
  /home/thanh/work/6a/22117f7452ad97fbd0784b24a3d962

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

content of execution_trace:

task_id hash    native_id   name    status  exit    submit  duration    realtime    %cpu    peak_rss    peak_vmem   rchar   wchar
3   6a/22117f   -   output_documentation (1)    FAILED  -   -   -   -   -   -   -   -   -
2   23/ade82d   -   make_ot_config  FAILED  -   -   -   -   -   -   -   -   -
1   f3/a2888f   -   remap_to_hla (1)    FAILED  -   -   -   -   -   -   -   -   -
4   ed/ede44f   -   get_software_versions   FAILED  -   -   -   -   -   -   -   -   -
christopher-mohr commented 4 years ago

Caused by: Process requirement exceed available memory -- req: 8 GB; avail: 7.6 GB

As stated here, there is unfortunately not sufficient memory available.

NTNguyen13 commented 4 years ago

Hi, I have used another computer with sufficient memory, and run the same command, this time it gives the following error:


[1e/66f512] process > remap_to_hla          [100%] 1 of 1, failed: 1 ✘
[6a/7e87e3] process > make_ot_config        [100%] 1 of 1 ✔
[-        ] process > run_optitype          -
[4f/c6d245] process > output_documentation  [100%] 1 of 1 ✔
[df/4fccff] process > get_software_versions [100%] 1 of 1 ✔
[35/fc5a21] process > multiqc               [100%] 1 of 1 ✔
Execution cancelled -- Finishing pending tasks before exit
[0;35m[nf-core/hlatyping] Pipeline completed with errors
WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info.
Error executing process > 'remap_to_hla (1)'

Caused by:
  Process `remap_to_hla (1)` terminated with an error exit status (1)

Command executed:

  samtools view -@ 1 -h -f 0x40 h > output_1.bam
  samtools view -@ 1 -h -f 0x80 h > output_2.bam
  samtools bam2fq output_1.bam > output_1.fastq
  samtools bam2fq output_2.bam > output_2.fastq
  yara_mapper -e 3 -t 1 -f bam /home/lucis/.nextflow/assets/nf-core/hlatyping/data/indices/yara/hla_reference_dna output_1.fastq output_2.fastq > output.bam
  samtools view -@ 1 -h -F 4 -f 0x40 -b1 output.bam > mapped_1.bam
  samtools view -@ 1 -h -F 4 -f 0x80 -b1 output.bam > mapped_2.bam

Command exit status:
  1

Command output:
  (empty)

Command error:
  [E::hts_open_format] Failed to open file h
  samtools view: failed to open "h" for reading: No such file or directory

Work dir:
  /home/lucis/work/1e/66f512fe1ea492f6c081bf0b54b289

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

I have checked the path by using

file=/home/lucis/IGSR_Project/1000GVN_aln/VN_01_00_0089_01_01.bam

/home/lucis/IGSR_Project/1000GVN_aln/VN_01_00_0089_01_01.bam: gzip compressed data, extra field

What have gone wrong this time?

christopher-mohr commented 4 years ago

Just to be sure: your bam file originates from paired end data?

NTNguyen13 commented 4 years ago

Yes, the bam file is paired-end, preprocessed by marking duplicate and sorting

christopher-mohr commented 4 years ago

I'm currently looking into the last error. Somehow the bam file name ($bams) here

samtools view -@ ${task.cpus} -h -f 0x40 $bams > output_1.bam
samtools view -@ ${task.cpus} -h -f 0x80 $bams > output_2.bam

gets substituted by the letter "h" in your case:

samtools view -@ 1 -h -f 0x40 h > output_1.bam
samtools view -@ 1 -h -f 0x80 h > output_2.bam

Are there maybe some odd characters in the command you used?

NTNguyen13 commented 4 years ago

I didn't setup the docker on the new computer so I used conda instead, the full command is:

./nextflow run nf-core/hlatyping -profile conda --readPaths "/home/lucis/IGSR_Project/1000GVN_aln/VN_01_00_0089_01_01.bam" --bam --outdir /home/lucis/IGSR_Project/1000GVN_result/VN_01_00_0089_01_01/hla/

christopher-mohr commented 4 years ago

Could you please try it with --reads '/home/lucis/IGSR_Project/1000GVN_aln/VN_01_00_0089_01_01.bam' instead of --readPaths "/home/lucis/IGSR_Project/1000GVN_aln/VN_01_00_0089_01_01.bam".

NTNguyen13 commented 4 years ago

Hi, sorry for not being able to get back sooner.

I tried to replace it with --reads, it seems to run smoothly until hit the runtime limit error: ./nextflow run nf-core/hlatyping -profile conda --reads "/home/lucis/IGSR_Project/1000GVN_aln/VN_01_00_0089_01_01.bam" --bam --outdir /home/lucis/IGSR_Project/1000GVN_result/VN_01_00_0089_01_01/hla/


[0;35m[nf-core/hlatyping] Pipeline completed with errors
WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info.
Error executing process > 'remap_to_hla (1)'

Caused by:
  Process exceeded running time limit (2h)

Command executed:

  samtools view -@ 1 -h -f 0x40 VN_01_00_0089_01_01.bam > output_1.bam
  samtools view -@ 1 -h -f 0x80 VN_01_00_0089_01_01.bam > output_2.bam
  samtools bam2fq output_1.bam > output_1.fastq
  samtools bam2fq output_2.bam > output_2.fastq
  yara_mapper -e 3 -t 1 -f bam /home/lucis/.nextflow/assets/nf-core/hlatyping/data/indices/yara/hla_reference_dna output_1.fastq output_2.fastq > output.bam
  samtools view -@ 1 -h -F 4 -f 0x40 -b1 output.bam > mapped_1.bam
  samtools view -@ 1 -h -F 4 -f 0x80 -b1 output.bam > mapped_2.bam

Command exit status:
  -

Command output:
  (empty)

Work dir:
  /home/lucis/work/d9/0ace74a9560295002f8fa42d0ee14d

I check htop and found that the samtools command run quite slow, maybe my BAM file is large? It's WGS at 30X.

christopher-mohr commented 4 years ago

At least it seems to run now ;). Could you please try the following:

NTNguyen13 commented 4 years ago

Hi, it's me again I tried to increase it to 5h but it still didn't work.

Error executing process > 'remap_to_hla (1)'

Caused by:
  Process exceeded running time limit (5h)

Command executed:

  samtools view -@ 1 -h -f 0x40 VN_01_00_0089_01_01.bam > output_1.bam
  samtools view -@ 1 -h -f 0x80 VN_01_00_0089_01_01.bam > output_2.bam
  samtools bam2fq output_1.bam > output_1.fastq
  samtools bam2fq output_2.bam > output_2.fastq
  yara_mapper -e 3 -t 1 -f bam /home/lucis/.nextflow/assets/nf-core/hlatyping/data/indices/yara/hla_reference_dna output_1.fastq output_2.fastq > output.bam
  samtools view -@ 1 -h -F 4 -f 0x40 -b1 output.bam > mapped_1.bam
  samtools view -@ 1 -h -F 4 -f 0x80 -b1 output.bam > mapped_2.bam

May this be solved if I provide the fastq file instead? In case I have paired-end reads from multiple lanes, how can I organize the input?

christopher-mohr commented 4 years ago

Hi, if you want to use fastq data, please specify it as following:

--reads

Use this to specify the location of your input FastQ files. For example:

--reads 'path/to/data/sample_*_{1,2}.fastq'
Please note the following requirements:

- The path must be enclosed in quotes
- The path must have at least one * wildcard character
- When using the pipeline with paired end data, the path must use {1,2} notation to specify read pairs.
- If left unspecified, a default pattern is used: data/*{1,2}.fastq.gz 

The pattern has to match your paired-end files.

christopher-mohr commented 4 years ago

VN_01_00_0089_01_01_S2_L004_R2_001.fastq VN_01_00_0089_01_01_S2_L004_R2_001.fastq VN_01_00_0089_01_01_S2_L004_R2_001.fastq

Did you post this accidentally? :)

NTNguyen13 commented 4 years ago

In case I have reads from 3 lanes, for example:

path/to/data/sample_L1_R1.fastq
path/to/data/sample_L1_R2.fastq
path/to/data/sample_L2_R1.fastq
path/to/data/sample_L2_R2.fastq
path/to/data/sample_L3_R1.fastq
path/to/data/sample_L3_R2.fastq

then I should input them as --reads 'path/to/data/sample_L{1,2,3}_{1,2}.fastq'

am I right?

P/s: yes, I have deleted it

christopher-mohr commented 4 years ago

The reads from multiple lanes won't be added up. You can either use the reads from each lane separately or merge the lanes and provide the merged lanes as input.

NTNguyen13 commented 4 years ago

Okay, I will try to use the fastq file

NTNguyen13 commented 4 years ago

Hi, thank you for your support. I have tried both ways, using bam and fastq files:

I'm curious about why using BAM files has such drawback.

christopher-mohr commented 4 years ago

Hi, so did it run through in the end?

Thanks for your feedback. I will take a look at the code again to see if something is wrong there.