Data preparation scripts for Remora models with random bases

nanoporetech / remora

Methylation/modified base calling separated from basecalling.

https://nanoporetech.com

Other

156 stars 20 forks source link

Data preparation scripts for Remora models with random bases #38

Closed AnWiercze closed 1 year ago

AnWiercze commented 2 years ago

Hello Remora Team,

In this year's ONT update, Clive mentioned that the newer models that perform better than BS-seq are trained with sequences that contain a modified position with +-30 random bases around that position, if I understand it correctly. Are the scripts to prepare the training data for this kind of input data publicly available? Right now only fully modified and unmodified reads are applicable with the data preparation scripts uploaded here, correct?

Thanks for your help!

Cheers, Anna

marcus1487 commented 2 years ago

These scripts are not currently publicly available. We are working to improve the robustness of this workflow and release this code at some point in the future.

We will be updating the data preparation scripts very soon to take pod5 and bam input to directly create a Remora dataset. This will add a lot more flexibility to dataset generation outside of the "fully modified at a motif" type datasets.

AnWiercze commented 2 years ago

Thanks a lot for sharing these information! I am looking forward to the next release. :)

marcus1487 commented 1 year ago

Betta is now available via a developer release. Please see instructions for accessing the repository in the community note here (login required): https://community.nanoporetech.com/posts/betta-tool-release