google / deepvariant

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
BSD 3-Clause "New" or "Revised" License
3.24k stars 729 forks source link

Suggestions for analyzing ONT R9 data #907

Open SHuang-Broad opened 6 days ago

SHuang-Broad commented 6 days ago

Hi, we have some R9 data that turn out to be a bit challenging. So we'd appreciate any suggestions.

We know that technically speaking, DV natively support only R10 data. And for R9, the recommended pipeline is PEPPER-Margin-DeepVariant.

The particular challenge we ran into with the PEPPER pipeline is actually in the DeepVariant stage (stage 5, after the margin haplotagging), where make_example failed. We used docker kishwars/pepper_deepvariant:r0.8

parallel: This job failed:
/opt/deepvariant/bin/make_examples --mode calling --ref /longreads/references/GRCh38_noalt/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa --reads /cromwell_root/pepper_output/intermediate_files/PHASED.PEPPER_MARGIN.haplotagged.bam --examples /cromwell_root/pepper_output/dv_intermediate_outputs/make_examples.tfrecord@16.gz --add_hp_channel --alt_aligned_pileup rows --gvcf /cromwell_root/pepper_output/dv_intermediate_outputs/gvcf.tfrecord@16.gz --min_base_quality 1 --min_mapping_quality 5 --parse_sam_aux_fields --partition_size 10000 --proposed_variants /cromwell_root/pepper_output/intermediate_files/PEPPER_VARIANT_FULL.vcf.gz --norealign_reads --sample_name this_is_secret --sort_by_haplotypes --variant_caller vcf_candidate_importer --task 3

real 16m52.881s
user 3m21.141s
sys 0m10.403s

real 17m13.407s
user 3m24.106s
sys 0m12.253s
[11-14-2024 19:22:08] INFO: [6/8] RUNNING THE FOLLOWING COMMAND

[note that the pipeline didn't stop after that failure, which is a known issue]

We tried the pipeline on a few samples and they all fail on a particular region (but it's not clear why).

Given that the support for the PEPPER pipeline has been wound down (or moved to DV), it's not clear to us what the best route forward is.

What we are thinking now, is to

  1. ask the PEPPER pipeline to stop after the haplotagging step (stage 4), then
  2. invoke DV manually, using a more recent version (1.5, 1.6.1)
  3. follow up with the rest of PEPPER pipeline (stage 6+)

Is this even feasible? Or is there a simpler approach? [This question can be posted as a more general question about the recommended strategy for analyzing R9 data, in the context of wound-down support of PEPPER.]

Thanks, Steve

kishwarshafin commented 1 day ago

hi @SHuang-Broad, as you can imagine that R9 is a pretty old discontinued chemistry so winding down the support naturally made sense. DeepVariant natively works with R10.4 simplex and duplex data with internal haplotagging which made PEPPER-Margin redundant for this use-case. PEPPER has not been updated since we have been able to support ONT natively with DeepVariant.

As for model goes, unfortunately there is no plan to support R9 with the newer versions of DeepVariant.

I think the best course of action is to run each step separately, you can run up to version 1.5.0 with the older models. In version 1.6.1, we moved to keras so the models will not be supported anymore.

Please run through each step separately and when DeepVariant fails, if you can provide the command and the input, I can possibly help you debug the issue.