pombase / website

PomBase website v2
MIT License
6 stars 1 forks source link

change label for CDS coordinates #59

Closed mah11 closed 6 years ago

mah11 commented 8 years ago

To avoid confusion about the original meaning (and any other interpretations, correct or incorrect) of "CDS", change the label to "start and end coordinates". Even if "CDS" would be technically correct, enough people don't realize it so changing to something ploddingly unambiguous won't go wrong.

kimrutherford commented 7 years ago

I guess this is from V1?

Currently in V2 we just have "II, 1500197-1502095 (1899nt)" with no CDS bit.

Would "start and end translation coordinates" be clearer?

ValWood commented 7 years ago

Hmm, I still think CDS is the correct way to refer to the coding sequence genome coordinates.

The problem is that people (incorrectly IMHO) are expecting the number 1899nt to represent the translation lenght. However it does't if it's the start and end of a spliced gene in the genome.

The CDS length (or the number we report) is the entire length of the coding sequence with introns in the genome, and I think that is correct.

The problem is how to explain this.....

ValWood commented 7 years ago

Maybe it should be the translated length in nucleotides. I think the meaning has changed over time. it used to be "from coding DNA sequence", which to me would make the CDS length the start and end of the CDS in the DNA not the start and end of the edited sequence.

The edited sequence seems to be what people expect. So we could report the nucleotide length of the translation in this case....

ValWood commented 7 years ago
Genomic location   II, 1500197-1502095 (1899nt) coding start to stop
                       1500197-1502095 (1899nt) including UTRs
mah11 commented 6 years ago

If we just stick with the current single set of coordinates, its current label ("genomic location") is fine. If you want to show with and without UTRs, the version from Jun 1 above (https://github.com/pombase/website/issues/59#issuecomment-305459606) would do.

kimrutherford commented 6 years ago

What text should we have for RNA genes and pseudogenes?

III, 2111204-2116520 (5317nt)

kimrutherford commented 6 years ago

III, 2111204-2116520 (5317nt)

Sorry, hit submit too soon. For now I've implemented it like "III, 2111204-2116520 (5317nt)" for genes without a translation, which is what we have at the moment. Is that enough in that case?

kimrutherford commented 6 years ago

For now I've implemented it like ...

It's on the main site now. Is it OK?

ValWood commented 6 years ago

it looks a bit odd, but I think it is clear what it means

Maybe we could say

(CDS start/end)

and (+UTRs)

to make the text shorter.

ValWood commented 6 years ago

Is my suggestion naff?

Antonialock commented 6 years ago

I have no idea, I tried to look at these recommendations http://www.hgvs.org/mutnomen/refseq.html but it's pretty confusing

ValWood commented 6 years ago

I think it would be clear to users . They all (should) know what a UTR and a CDS are...

Antonialock commented 6 years ago

well it is the CDS that's confusing? You said it is the "coding sequence genome coordinates" but according to HGVN +1 is assigned to translation initiation codon in the CDS. What you are talking about is the "genomic reference sequence"?

ValWood commented 6 years ago

Its the genomic location of the CDS though?

I still maintain that CDS was "invented" to describe the genomic location in an EMBL/Genbank file, so its the coordinates of the coding sequence in whichever sequence you are referring to. i.e the genome here:

FT CDS complement(23589..23978)

Antonialock commented 6 years ago

well yeah, but according to the site I linked to, different people use the term differently, which is why I find it confusing.

ValWood commented 6 years ago

but what else could it mean in the context we are using it above?

https://en.wikipedia.org/wiki/Coding_region CDS = coding DNA sequence

ValWood commented 6 years ago

I think it's OK to close?