nanoporetech / modkit

A bioinformatics tool for working with modified bases
https://nanoporetech.com/
Other
128 stars 7 forks source link

Trouble in modkit entropy #251

Closed DKS3713 closed 1 week ago

DKS3713 commented 2 weeks ago

Hi,

I planned to use modkit to calculate methylation entropy, my code is as follow:

modkit entropy \
 --in-bam ~/nanopore/test.bam \
 --base C \
 --ref ~/reference/hg38/hg38.fa

Well, I met the following error:

> window size is set to 50, motif (RegexMotif { forward_pattern: OverlappingRegex { inner: Regex("C") }, reverse_pattern: OverlappingRegex { inner: Regex("G") }, motif_info: MotifInfo { forward_offset: 0, reverse_offset: 0, length: 1, is_palendrome: false }, raw_motif: "C" }) length is 1
thread '<unnamed>' panicked at src/reads_sampler/sampling_schedule.rs:731:22:
reference sequences and names not in sync
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Bam file is generated by dorado v0.7.2 and with model V5(sup@v5.0.0_5mCG_5hmCG_6mA).

ArtRand commented 2 weeks ago

Hello @DKS3713,

Could you confirm that the modBAM you're using has been aligned to the genome you're passing to --ref? You can make sure the sequence names are the same by checking

samtools view -H test.bam | grep "@SQ"

against

samtools view hg38.fa | awk '{print $1, length($10)}'

You should see the same number of records and the lengths should be the same as well.

You can also confirm that you have mapped records with:

samtools view -F 2308 test.bam | wc -l # reports the number of mapped, primary reads

The link you have doesn't go to a file for me, could you update it? If you could tell me where you get the reference sequence that would also help debug. Sorry for the inconvenience.

DKS3713 commented 1 week ago

Hi @ArtRand,

Thanks for your reply.

I checked the reference file, and found I used different fasta files in dorado and modkit.

In dorado, I used GRCh38.d1.vd1.fa, while in modkit I used hg38.fa.

I got no error in this operation while using other modkit function like pileup, but got error in modkit entropy.

By the way, the modkit version I am using is 0.3.1.

ArtRand commented 1 week ago

Hello @DKS3713,

For entropy (or any command that uses a reference) you must use the same reference sequence FASTA that you used in alignment. I checked these two references, and the contig names are different, so that's not going to work, modkit will not assume that chr1 is the same as 1. In the latest release, the entropy command will check for this and return an error instead of the crash. Thanks for bringing this to my attention.

ArtRand commented 1 week ago

Feel free to re-open this issue if you encounter any additional problems.