nanoporetech / tombo

Tombo is a suite of tools primarily for the identification of modified nucleotides from raw nanopore sequencing data.
Other
229 stars 55 forks source link

DNA vs RNA from raw reads #364

Open Alannahh opened 3 years ago

Alannahh commented 3 years ago

Hi there,

I don't have an issue as such, more a (hopefully) quick question about how Tombo determines whether a sample is DNA or RNA from the raw reads in this scenario below:

"If DNA or RNA sample type is not explicitly specified (via --dna or --rna options) the sample type will be detected automatically from the raw read files."

I have generated DNA samples that we would expect to have large amounts of replacement of T with U, and I noticed (initially accidentally) that Tombo is picking these samples up as RNA if I don't specify the type. Is there something specific that Tombo looks for in the raw reads to determine the read type?

Many thanks!

marcus1487 commented 3 years ago

There is not an attribute in the FAST5 indicating whether the sample is DNA or RNA, so the function in tombo to guess this is found here: https://github.com/nanoporetech/tombo/blob/master/tombo/tombo_helper.py#L872 I would suggest as you have noted to use the --dna and --rna flags whenever possible to avoid this function guessing incorrectly.