Closed ValWood closed 1 year ago
@manulera I will assign this one to you. I these are is just a case of standardizing the descriptions to align with what we use for nucleotides where possible.
Ok, I will dig into these, but shouldn't we describe them at the DNA level since we can?
Ok, I will dig into these, but shouldn't we describe them at the DNA level since we can?
I'm not sure.
CONS We have quite a lot of pending changes https://www.pombase.org/status/sequence-updates-pendingto make to the sequence, and gaps to fill. I need to do a genome revision, (I have been waiting on gap filling information. Feedback from the community indicates they would prefer as few genome sequence revisions as possible because they will need to remap all of their datasets).
When we do a sequence update, we would need to revise every allele that used DNA coordinates downstream of the sequence change. For this reason, I think it is more robust to describe in accordance with the specific feature, and then we only need to revise the allele naming if and when an individual feature coordinates change. We can always automatically convert any features described at the corresponding nt level into DNA coordinates.
PRO It seems is more aligned with convention for variant data to describe at the nucleotide level. (but most of the variation data for human/mouse is natural variation and so is identified via sequencing).
What do C. elegans /WorkBase and other groups do here when describing alleles/genotypes from small scale publications? Do they use DNA nucleotide coordinates?
We agreed to use the first transcribed nucleotide as index 1, and use T instead of U, this is now in the paper.
These should be possible to describe as nucleotide alterations. We need to check that we have the precise ncRNA structure annotated first.
srp7-C180U,omega176U,SPNCRNA.98:allele-68,C180U,U inserted after nt 176,{} sme2deltaB,SPNCRNA.103:allele-11,-1030-582,{e3c8313e9bd1dabc} srp7-U88G,omegaUCGA,SPNCRNA.98:allele-57,U88G,UCGA inserted after nt84,{} mrp1-LoopA,SPNCRNA.82:allele-8,U371A,C372A,Ins G(374-375),U377G,{8c45a7b8efd78a20} snu1-5' extension,SPSNRNA.01:allele-18,AU inserted at 5' end,{} sme2deltaA,SPNCRNA.103:allele-9,-1030-1586,{e3c8313e9bd1dabc} srp7-D4E-3,SPNCRNA.98:allele-115,GC inserted after nt 164,{} sme2-DSRless,SPNCRNA.103:allele-4,all hexanucleotide DSR motifs except the most upstream one disrupted by substituting TNAAAC with TNAAGC,{45395a087bd927e5} srp7-U88G,delta84-88,omega8nt,SPNCRNA.98:allele-10,U88G,nt84 deletion,8 nucleotides inserted after nt83,{} sme2-m,SPNCRNA.103:allele-5,TATA box mutant,{45395a087bd927e5,e3c8313e9bd1dabc} srp7-D4E-1,SPNCRNA.98:allele-104,GC inserted after nt 158,GC inserted after nt 164,{} srp7-A146U,G159U,delta148-150,SPNCRNA.98:allele-9,A146U,G159U,nt148-150 deletion,{} mrp1-TOs,SPNCRNA.82:allele-17,replace 34-74 with RNase P RNA,{9745b103c44cf72d} srp7-D4E-2,SPNCRNA.98:allele-105,G160C,GC inserted after nt 158,GC inserted after nt 164,{} srp7-U88G,delta84-88,omega131nt,SPNCRNA.98:allele-1,U88G,nt84-88 deletion,131 nucleotides inserted after nt83,{}