nanoporetech / remora

Methylation/modified base calling separated from basecalling.
https://nanoporetech.com
Other
156 stars 20 forks source link

old models use in remora dataset prepare #108

Closed BioRB closed 1 year ago

BioRB commented 1 year ago

Hello, I'm trying to use remora for a ONT run performed using an old 9.4.1 flowcell (setting r9.4_180mv_450bps). I don't understand how I'm supposed to use the legacy model suitable for this expriment ( https://github.com/nanoporetech/kmer_models/tree/master/legacy/legacy_r9.4_180mv_450bps_6mer). if I try to pass the model in the flag --refine-kmer-level-table, I get an error becose old models have more columns. Can you explain how I'm supposed to pass it in the remora dataset prepare process? Or there is another way to do it?

marcus1487 commented 1 year ago

The legacy kmer models are ... legacy. Use the RNA models found in the kmer_model repository as direct input into the Remora argument. https://github.com/nanoporetech/kmer_models/tree/master/rna_r9.4_180mv_70bps

marcus1487 commented 1 year ago

You can also see the documentation for the format of the kmer table using the remora dataset prepare -h command. Help text attached here:

  --refine-kmer-level-table REFINE_KMER_LEVEL_TABLE
                        Tab-delimited file containing no header and two fields: 1. string k-mer sequence and 2. float expected normalized level. All k-mers must be
                        the same length and all combinations of the bases 'ACGT' must be present in the file. (default: None)
BioRB commented 1 year ago

why RNA? I have DNA. regarding the --refine-kmer-level-table command, I 've read the manual but still I wdont' understand the REFINE_KMER_LEVEL_TABLE. Am I supposed to create a "custom" file? and if yes, why and if yes...how? If I understand well, there is no correspondance between the kmer-level-tables and the legacy models. right?

marcus1487 commented 1 year ago

Ah I misread your legacy model as an RNA model instead of a DNA one. We do not have a supported k-mer model released for the R9 DNA chemistry. The legacy tables can be used for Remora (they will be re-scaled when read into Remora) but need to be reformatted as described in the documentation. If you would like to use supported k-mer tables, I would recommend updating to the latest chemistry. If you require the R9 table then I would recommend converting the legacy R9 tables as the best option, but would suggest that improved results may be obtained with the latest chemistry.

BioRB commented 1 year ago

Can I perform the analysis without a model? Sorry for the dumb question but I don't understand if I can perform a basic analysis without training the model. I see there are 4 main steps: Data Preparation, Model Training, Model Inference and Raw Signal Analysis. Can you show me how to do a basic analysis to find modified bases? I have the 2 data types (non-modified vs modified sequences fast5). Following your manual I used Dorado to generate bam and pod5. Now, what I'm supposed to do? thank you very much!

marcus1487 commented 1 year ago

Assuming this is a modified base for which there is not a released modified base model (via Dorado/Remora), then Remora does provide tools for the analysis of modified bases via raw signal, but I would not classify this type of analysis as basic. Each new modified base may present in signal space a bit differently and thus the analysis can quickly become quite complex.

For signal visualization, the remora analyze plot ref_region is the easiest command for quick inspection of signal at known reference locations. For more in depth analysis, the README provides a pointer to a notebook included in the repo that gives an example of identifying divergence in signal between two samples. For more advanced model training for specific modified bases of interest I would suggest looking into the Betta developers program: https://community.nanoporetech.com/posts/betta-tool-release

drivenbyentropy commented 1 year ago

The legacy tables can be used for Remora (they will be re-scaled when read into Remora) but need to be reformatted as described in the documentation.

@marcus1487 My apologies for opening an old issue. I was trying to locate the documentation you were referring to for converting the older kmer tables into the new format but was not able to find it. Could you please link to it here? Thanks!

marcus1487 commented 1 year ago

See the --refine-kmer-level-table option from the remora dataset prepare -h output.