guppy2sam produced an empty BAM file

zhanxw commented 4 years ago

Describe the bug guppy2sam produced an empty BAM file

Logging The relevant command lines are pasted:

(medaka) [xzhan9@Nucleus005 nanopore]$ medaka methylation guppy2sam --reference ${REFERENCE} ${FAST5PATH} \

--workers 32 --recursive \
| samtools sort -@ 8 | samtools view -b -@ 8 > ${OUTBAM}
[05:09:48 - ModExtract] NOTE: Mod. base scores are output w.r.t the sequencing direction, not the aligned read orientation. [05:09:48 - Extractor] Starting worker processes. [05:09:48 - Extractor] Found 81 files to process. [05:09:48 - ModExtract] Processing fast5_pass/FAL15598_359b3b9b63235513017a3ad8c726078fdd3febd6_2.fast5. [05:09:48 - ModExtract] Processing fast5_pass/FAL15598_359b3b9b63235513017a3ad8c726078fdd3febd6_67.fast5. [05:09:48 - ModExtract] Processing fast5_pass/FAL15598_359b3b9b63235513017a3ad8c726078fdd3febd6_9.fast5. [05:09:48 - ModExtract] Processing fast5_pass/FAL15598_359b3b9b63235513017a3ad8c726078fdd3febd6_27.fast5. [05:09:49 - Extractor] Extracted 1/81 files. [05:09:49 - ModExtract] Processing fast5_pass/FAL15598_359b3b9b63235513017a3ad8c726078fdd3febd6_17.fast5. [05:09:49 - Extractor] Extracted 2/81 files. [05:09:49 - ModExtract] Processing fast5_pass/FAL15598_359b3b9b63235513017a3ad8c726078fdd3febd6_48.fast5. ...

Environment (if you do not have a GPU, write No GPU):

Installation method: conda
OS: Red Hat Enterprise Linux Server release 7.4 (Maipo)
GPU model: no GPU
Nvidia driver version: n/a
CUDA version: n/a
cuDNN version: n/a

Additional context The problem of this issue is similar to issue #129, but I don't think it is the same reason. I verified ont-fast5-api version is 2.0.1. For other versions, please see below:

ont-fast5-api==2.0.1 ont-tombo==1.5.1 medaka==0.11.5 samtools 1.9 (htslib 1.9)

The reference is GCF_000006765.1_ASM676v1_genomic.fna

mwykes commented 4 years ago

Hi @zhanxw, without having the data, it's quite hard to debug what is going wrong. Is it possible for you to share one of your fast5 files?

cjw85 commented 4 years ago

@zhanxw

Can you confirm that when running guppy you are consistently using the modified base model, so you are running something like:

guppy_basecaller \
    --save_path <output path> --input_path <input path> \
    --compress_fastq --fast5_out \
    --config dna_r9.4.1_450bps_modbases_dam-dcm-cpg_hac_prom.cfg

zhanxw commented 4 years ago

I did not use this config, as the data were generated by MinION.

Can you remind me an online website to share large sequence files?

On Thu, Mar 12, 2020 at 4:31 PM cjw85 notifications@github.com wrote:

@zhanxw https://github.com/zhanxw

Can you confirm that when running guppy you are consistently using the modified base model, so you are running something like:

guppy_basecaller \ --save_path --input_path \ --compress_fastq --fast5_out \ --config dna_r9.4.1_450bps_modbases_dam-dcm-cpg_hac_prom.cfg

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nanoporetech/medaka/issues/139#issuecomment-598427059, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABGRCASC4G5CGZQILRITITRHFIBTANCNFSM4K4ZH3QA .

cjw85 commented 4 years ago

@zhanxw In order for medaka to make methylation predictions you will need to run the standalone guppy and use the modified base model as in the example command above.

cjw85 commented 4 years ago

A warning has been added to the documentation to make the above point clear.

TaniaJes commented 4 years ago

Hi there,

When I try to run:

medaka consensus --save_features --check_output --model r941_min_high_g344 ./reads_minimap2.filtered.sorted.bam ./reads_minimap2.filtered.sorted.hdf

it seems like the program halts after reading the first few lines from the .bam file, and then fails to continue reading the bam file. (See error message below).

[22:25:18 - Predict] Setting tensorflow threads to 1. [22:25:18 - Predict] Processing 248460 long region(s) with batching. [22:25:18 - Predict] Using model: r941_min_high_g344_model.hdf5. [22:25:18 - ModelLoad] Building model with cudnn optimization: False [22:25:19 - DLoader] Initializing data loader [22:25:19 - Sampler] Initializing sampler for consensus of region ENST00000434970.2:0-9. [22:25:19 - Sampler] Initializing sampler for consensus of region ENST00000415118.1:0-8. [22:25:19 - Sampler] Initializing sampler for consensus of region ENST00000448914.1:0-13. [22:25:19 - Sampler] Initializing sampler for consensus of region ENST00000631435.1:0-12. Failed to read .bam file './reads_minimap2.filtered.sorted.bam'.%

How could I fix this?

Thanks, Tania

cjw85 commented 4 years ago

@TaniaJes

I suspect that your reads_minimap2.filtered.sorted.bam file is either invalid or that you do not have a corresponding bam index (reads_minimap2.filtered.sorted.bam.bai) file alongside your bam. If you have further questions please start a new issue.

nanoporetech / medaka

guppy2sam produced an empty BAM file #139