nanoporetech / megalodon

Megalodon is a research command line tool to extract high accuracy modified base and sequence variant calls from raw nanopore reads by anchoring the information rich basecalling neural network output to a reference genome/transriptome.
Other
197 stars 30 forks source link

Variant phasing workflow generate 'CRF models' error #277

Closed liuyang2006 closed 2 years ago

liuyang2006 commented 2 years ago

Hi, I am following the tutorial here(https://nanoporetech.github.io/megalodon/variant_phasing.html) to run phasing, but the first step below occurs error when run megalodon variant calling:

megalodon \
    $reads_dir --overwrite \
    --guppy-config  dna_r9.4.1_450bps_sup.cfg\
    --outputs mappings per_read_variants variants variant_mappings \
    --reference $ref --variant-filename $variants_vcf \
    --output-directory $out_dir \
    --processes $nproc  \
    --verbose-read-progress 3\
    --guppy-server-path ${guppyDir}/bin/guppy_basecall_server

Error is below:

+ megalodon human_ci_test_fast5 --overwrite --guppy-config dna_r9.4.1_450bps_sup.cfg --outputs mappings per_read_variants variants variant_mappings --reference hg38_chr22/hg38_chr22.fasta --variant-filename variants.vcf.gz --output-directory megalodon_results --processes 16 --verbose-read-progress 3 --guppy-server-path /fastscratch/liuya/nanome/Phasing/ont-guppy-cpu/bin/guppy_basecall_server
[12:22:42] Running Megalodon version 2.5.0
[12:22:42] Loading guppy basecalling backend
[12:22:46] Loading reference
[12:22:48] Loaded model calls canonical alphabet ACGT and no modified bases
****************************************************************************************************
    ERROR: Sequence variant outputs is not implemented for CRF models.
****************************************************************************************************

The error message saied that model is not propriate, but if I remove the --guppy-config param, the command still can not work anymore.

marcus1487 commented 2 years ago

Megalodon variant calling was based on the now deprecated flip-flop model structure. The newer style CRF model output is not compatible (or feasible) to apply the same approach for variant calling. Only flip-flop models would work for this purpose. Removing the --guppy-config simply uses the default model (dna_r9.4.1_450bps_modbases_5mc_hac.cfg) which is a CRF canonical basecalling model with a Remora modified base model. I don't believe guppy contains any flip-flop models any longer. The variant calling/phasing functionality from megalodon should be considered deprecated. I would recommend longphase for read phasing.

liuyang2006 commented 2 years ago

Thank you for your confirmation!