nanoporetech / modkit

A bioinformatics tool for working with modified bases
https://nanoporetech.com/
Other
148 stars 8 forks source link

Indexing error in dmr multi #285

Open Ge0rges opened 1 month ago

Ge0rges commented 1 month ago

Hi @ArtRand,

I got this error when running a command that works on many of my other samples on v0.4.1:

thread '<unnamed>' panicked at src/genome_positions.rs:100:16:
range end index 110770 out of range for slice of length 110715
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Rayon: detected unexpected panic; aborting

I executed something like ./do_methylate.sh: line 44: 335042 Aborted modkit dmr multi $samples -r gene-coordinates.txt -o dmr_by_gene/ -t 20 --ref $genome --base C --base A --min-valid-coverage 5 with multiple replicate samples. Let me know what I can provide to make this reproducible.

ArtRand commented 1 month ago

Hello @Ge0rges,

Sorry about the error. This can happen if your regions -r address coordinates that are outside of the sequence passed to --ref. There should be a better check for this, so I'll add that. Could you check that all of the intervals in gene-coordinates.txt are inside the sequence lengths of the contigs in $genome?

Ge0rges commented 1 month ago

That's interesting. I've actually been in the habit of creating the gene-coordinates.txt from a table of all gene calls I have for my entire metagenome. I figured modkit would take the ones that are present in my MAG (i.e. sequences that are a subset of the metagenome). That has worked fine so far, though I guess it is an unfair assumption to make on modkit.

So there is most likely an interval as you suggest that is wider because sometimes my MAG crops a sequence. I'll check that right now.

Ge0rges commented 1 month ago

If it's useful to you, here is my gene-coordinates file and the reference file.

ArtRand commented 1 month ago

@Ge0rges thanks I'll check. I agree modkit shold do the check.