Closed mah11 closed 6 years ago
I guess this is from V1?
Currently in V2 we just have "II, 1500197-1502095 (1899nt)" with no CDS bit.
Would "start and end translation coordinates" be clearer?
Hmm, I still think CDS is the correct way to refer to the coding sequence genome coordinates.
The problem is that people (incorrectly IMHO) are expecting the number 1899nt to represent the translation lenght. However it does't if it's the start and end of a spliced gene in the genome.
The CDS length (or the number we report) is the entire length of the coding sequence with introns in the genome, and I think that is correct.
The problem is how to explain this.....
Maybe it should be the translated length in nucleotides. I think the meaning has changed over time. it used to be "from coding DNA sequence", which to me would make the CDS length the start and end of the CDS in the DNA not the start and end of the edited sequence.
The edited sequence seems to be what people expect. So we could report the nucleotide length of the translation in this case....
Genomic location II, 1500197-1502095 (1899nt) coding start to stop
1500197-1502095 (1899nt) including UTRs
If we just stick with the current single set of coordinates, its current label ("genomic location") is fine. If you want to show with and without UTRs, the version from Jun 1 above (https://github.com/pombase/website/issues/59#issuecomment-305459606) would do.
What text should we have for RNA genes and pseudogenes?
III, 2111204-2116520 (5317nt)
III, 2111204-2116520 (5317nt)
Sorry, hit submit too soon. For now I've implemented it like "III, 2111204-2116520 (5317nt)" for genes without a translation, which is what we have at the moment. Is that enough in that case?
For now I've implemented it like ...
It's on the main site now. Is it OK?
it looks a bit odd, but I think it is clear what it means
Maybe we could say
(CDS start/end)
and (+UTRs)
to make the text shorter.
Is my suggestion naff?
I have no idea, I tried to look at these recommendations http://www.hgvs.org/mutnomen/refseq.html but it's pretty confusing
I think it would be clear to users . They all (should) know what a UTR and a CDS are...
well it is the CDS that's confusing? You said it is the "coding sequence genome coordinates" but according to HGVN +1 is assigned to translation initiation codon in the CDS. What you are talking about is the "genomic reference sequence"?
Its the genomic location of the CDS though?
I still maintain that CDS was "invented" to describe the genomic location in an EMBL/Genbank file, so its the coordinates of the coding sequence in whichever sequence you are referring to. i.e the genome here:
FT CDS complement(23589..23978)
well yeah, but according to the site I linked to, different people use the term differently, which is why I find it confusing.
but what else could it mean in the context we are using it above?
https://en.wikipedia.org/wiki/Coding_region CDS = coding DNA sequence
I think it's OK to close?
To avoid confusion about the original meaning (and any other interpretations, correct or incorrect) of "CDS", change the label to "start and end coordinates". Even if "CDS" would be technically correct, enough people don't realize it so changing to something ploddingly unambiguous won't go wrong.