nanoporetech / remora

Methylation/modified base calling separated from basecalling.
https://nanoporetech.com
Other
149 stars 18 forks source link

documentation #130

Closed Jeff-Field closed 7 months ago

Jeff-Field commented 8 months ago

Is there a set of documentation for Remora to get us started? We have a set of data ready but are having trouble getting started. We are interested in damaged base sequencing. Also, are the oligo sequences to be used for training models available. Finally, how are the olives used. our own experience with oligos in nanopore was that could not be sequenced reliably as the were lost during AMPX bead cleanups.

marcus1487 commented 8 months ago

For randomer training we use the Betta code base. If you are interested in this please fill out the form here: https://community.nanoporetech.com/posts/betta-tool-release

For getting started with Remora alone README provides the basic usage commands. Is there a specific task which is missing from the Remora README?

The oligo sequences used for training models is not publicly available, but the general outline has been presented at our conferences and can be found in the video archives there. More details on sample prep and analysis of randomer training data can be found in the Betta repository after access has been granted.

Jeff-Field commented 8 months ago

Thank you Marcus, We are not sure how to assemble the BAM training file. Also, as the sequences for training were not published, we made our own. We are interested in telomere damage. As a starting point we made oligos with 8-oxo dG in a telomere context and used them as a pcr primer.

Also, the link that you sent leads to a page where the sign up link is dead. thanks

Jeff

Professor Jeffrey Field, Ph.D. Department of Systems Pharmacology and Translational Therapeutics Perelman School of Medicine University of Pennsylvania 1313 BRB 2-3 Building 421 Curie Blvd
Philadelphia, PA 19104 Phone (215)-898-1912 fax (215)-573-0200 email @.**@.> Information about Penn Pharmacology: http://www.med.upenn.edu/pharm/ Summer Programs: http://ceet.upenn.edu/training-career-development/summer-programs/http://ceet.wpengine.com/training-career-development/summer-programs/

Research interest statement:
http://www.med.upenn.edu/apps/faculty/index.php/g310/c1464/p16384 or
http://www.med.upenn.edu/apps/faculty/index.php/g20000343/p16384 https://www.med.upenn.edu/fieldlab

From: Marcus Stoiber @.> Date: Thursday, November 9, 2023 at 2:37 PM To: nanoporetech/remora @.> Cc: Field, Jeffrey Michael @.>, Author @.> Subject: Re: [nanoporetech/remora] documentation (Issue #130)

For randomer training we use the Betta code base. If you are interested in this please fill out the form here: https://community.nanoporetech.com/posts/betta-tool-releasehttps://urldefense.com/v3/__https:/community.nanoporetech.com/posts/betta-tool-release__;!!IBzWLUs!UZyq0xXWWmV1KwhabPpItFbfq9BmEQcj5sBtRPtqDMQjSOZGPt370dYwSg9g9g_cf6K8sCvjG7hXDIRqbt0ajTtX$

For getting started with Remora alone README provides the basic usage commands. Is there a specific task which is missing from the Remora README?

The oligo sequences used for training models is not publicly available, but the general outline has been presented at our conferences and can be found in the video archives there. More details on sample prep and analysis of randomer training data can be found in the Betta repository after access has been granted.

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/nanoporetech/remora/issues/130*issuecomment-1804508102__;Iw!!IBzWLUs!UZyq0xXWWmV1KwhabPpItFbfq9BmEQcj5sBtRPtqDMQjSOZGPt370dYwSg9g9g_cf6K8sCvjG7hXDIRqboJeKchV$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AO6NFLTX53X5TVWR7XFMJUTYDUWFXAVCNFSM6AAAAAA7E7PY2WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBUGUYDQMJQGI__;!!IBzWLUs!UZyq0xXWWmV1KwhabPpItFbfq9BmEQcj5sBtRPtqDMQjSOZGPt370dYwSg9g9g_cf6K8sCvjG7hXDIRqbhd_MDD9$. You are receiving this because you authored the thread.Message ID: @.***>

marcus1487 commented 8 months ago

We are not sure how to assemble the BAM training file. Output from Dorado with --reference and --emit-moves set should be sufficient as stated in the README. If you have tried this and have additional questions expanding upon your issues would be helpful.

Also, as the sequences for training were not published, we made our own. We are interested in telomere damage. As a starting point we made oligos with 8-oxo dG in a telomere context and used them as a pcr primer. I'm not sure I completely understand this construct design. Are you aiming to sequence just the oligos with the 8-oxo-dG in them? Or the oligos as well as the extension with PCR? Using some reference material? If the modified bases are only in the oligo then these are the only modified training chunks you will have. I might not be understanding the construct though. Could you expand on your setup and the training chunks you are aiming to extract?

Also, the link that you sent leads to a page where the sign up link is dead.

Will chase this up. Thanks for flagging up the dead link!

Jeff-Field commented 8 months ago

From: Marcus Stoiber @.> Date: Thursday, November 9, 2023 at 6:23 PM To: nanoporetech/remora @.> Cc: Field, Jeffrey Michael @.>, Author @.> Subject: Re: [nanoporetech/remora] documentation (Issue #130)

We are not sure how to assemble the BAM training file. Output from Dorado with --reference and --emit-moves set should be sufficient as stated in the README. If you have tried this and have additional questions expanding upon your issues would be helpful.

Can we use the align function on MinKnow? Now our BAM files were made by aligning our sequences against the expected sequence in BWA and viewing in IGV, which has some editing capability

Also, as the sequences for training were not published, we made our own. We are interested in telomere damage. As a starting point we made oligos with 8-oxo dG in a telomere context and used them as a pcr primer. I'm not sure I completely understand this construct design. Are you aiming to sequence just the oligos with the 8-oxo-dG in them? Or the oligos as well as the extension with PCR? Using some reference material? If the modified bases are only in the oligo then these are the only modified training chunks you will have. I might not be understanding the construct though. Could you expand on your setup and the training chunks you are aiming to extract?

We have 100 bp of telomere cloned into a puc plasmid. We made an oligos near one end overlapping the cloning junction, both wt and with a fixed 8-oxo in middle G of the telomere repeats (GGGTTA). The reverse primer is 700 bp away. PCR gives us a 700 bp fragment with 8-oxo at a fixed position. When we basecall, there is >25% misreading with many indels at the 8-oxo and surrounding bases compared with <1% misreading of the wt oligo. I want to use this data to train Remora for a model. I figure we can choose a chunk surrounding the 8-oxo. I know that we have only 1 of the three Gs oxidized, but this way we know the exact sequence. I plan to make the other two oligos later, perhaps eventually basecalling 3 times with the three models. We also have a dataset from randomly oxidized fragment for later testing.

We tried a couple of times with oligos, one of which was a hairpin, but they were lost in library prep. Also, the hairpin did not sequence well. I later heard that you abandoned hairpinning as a double stranded sequence strategy as the hairpins did not go through the pores well. I also think that with all of the repeats in the telomere sequence most oligo strategies will be problematic with mis-aligned basepairing.

Also, the link that you sent leads to a page where the sign up link is dead.

Will chase this up. Thanks for flagging up the dead link!

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/nanoporetech/remora/issues/130*issuecomment-1804835762__;Iw!!IBzWLUs!WfJb8OmE18PrMOq6-n0rlcEg5Ee1p2byFx55rnobTpt3N7ho4_-dEgnZQTckqr0vULXRtevCX0kvRUjXmNdxPh8Q$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AO6NFLVKNHM6XAENTRAJBY3YDVQXRAVCNFSM6AAAAAA7E7PY2WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBUHAZTKNZWGI__;!!IBzWLUs!WfJb8OmE18PrMOq6-n0rlcEg5Ee1p2byFx55rnobTpt3N7ho4_-dEgnZQTckqr0vULXRtevCX0kvRUjXmBP5TMI5$. You are receiving this because you authored the thread.Message ID: @.***>

AngieKou commented 8 months ago

Hello,

Thank you for highlighting the link issue. Please use this link to register your interest for our Betta tool:

https://register.nanoporetech.com/betta-protocol-and-software

All the best, Angie

Jeff-Field commented 5 months ago

Hi we have progressed somewhat. we can get the --emit moves function and an alignment to our reference using Dorado. but when we run the resulting BAM file through remora, Remora sends an error, that we are missing a reference with an MD tag. could this be because we are not using the .MD tag in our reference when we set up the calling and the alignment in Dorado? we are using a text file in .fasta format.

marcus1487 commented 4 months ago

Dorado adds the MD tag by default. This should not result in the MD tag message. It could be that some reads in your BAM are unmapped and therefore don't have the MD tag. This message could be better and I will look into that.