pombase / genome_changelog

MIT License
1 stars 1 forks source link

Which features keep in the modification-only changelog #2

Closed manulera closed 1 year ago

manulera commented 1 year ago

Hi @ValWood

I have added the pre-svn changes to the repository. I thought it would be nice to have a list where only modifications are shown, to double check with gene warnings (modifications as in changes to gene coordinates, rather than new additions or removals of features). To make the list more comprehensive I have excluded files of the type "5'UTR","3'UTR",'intron','promoter','LTR', but I wonder if others should be excluded as well. (introns are always implied in the CDS). The feature types that are left are ['CDS' 'ncRNA' 'snRNA' 'repeat_region' 'rRNA' 'tRNA' 'snoRNA' 'misc_feature' 'misc_RNA'].

You can have a look at the file here:

https://github.com/pombase/genome_changelog/blob/pre_svn/only_modified_coordinates.tsv

ValWood commented 1 year ago

Looks good!

I think you could exclude the misc_feature s (these are a bit random, and we did not assign identifiers properly until after 2010, often they were basically just notes).

I think it is a good idea to exclude the other feature types you excluded. The UTRS we will change the entire data set soon, so there would be an entry for every single one. Introns would be covered by changes to the CDS anyway Promoters we have very few annotated. LTRs I don't think I have changed any but it is difficult to be precise about their beginnings and ends because they are often 'degraded'

ValWood commented 1 year ago

OMG these are all pre svn! There are tonnes!

manulera commented 1 year ago

Ok, good. Closing with #3. I have excluded the misc_feature ones. The updated list is in this file:

https://github.com/pombase/genome_changelog/blob/master/only_modified_coordinates.tsv

Not a huge fan of github's tsv view, you can also see the raw file here:

https://raw.githubusercontent.com/pombase/genome_changelog/master/only_modified_coordinates.tsv