nanoporetech / remora

Methylation/modified base calling separated from basecalling.
https://nanoporetech.com
Other
149 stars 18 forks source link

Question about data preparation for training “All Context” modification model on Remora and the questions about the --refine-kmer-level-table input #177

Open sparkcyf opened 1 month ago

sparkcyf commented 1 month ago

I am currently working on training a model to detect DNA modifications in all contexts (all positions) using Remora. During the data preparation step, I encountered an issue where the --motif argument is required:

remora dataset prepare: error: the following arguments are required: --motif

I want to detect modifications at all positions, but I am unsure how to specify the --motif parameter to achieve this. In previous issues, such as https://github.com/nanoporetech/remora/issues/62 , it appears that the training scripts did not require this parameter.

Here is the script I am using for data preparation:

remora \
  dataset prepare \
  converted.pod5 \
  basecalls.bam  \
  --output-path mod_chunks \
  --refine-kmer-level-table tombo_model_5hmc.tsv \
  --refine-rough-rescale \
  --focus-reference-positions 5hmc_sites.bed \
  --mod-base m 5hmC

Could you please provide guidance on how to properly declare the --motif parameter for detecting modifications at all positions?

Thanks in advance!

marcus1487 commented 1 month ago

For all-contexts models the motif argument would be --motif C 0 (the zero indicates that the 0 position is the focus base in the motif).