Open ValWood opened 1 week ago
Add discuss to this if it doesn't make sense. I think it will be quick. I'm worried that people are using this link in queries which would be totally wrong...
Just to double check: is transmembrane_helix (SO:0001812) the term to use for TM domains?
Just to double check: is transmembrane_helix (SO:0001812) the term to use for TM domains?
Yep.
Thanks. I'm going to create SO:0001812 annotations in Chado for the 912 genes with TM domains. Should with make a new PB_REF to associate with the annotations? Or we could just use the PMID of the TMHMM paper.
Lets use PMID of the TMHMM paper since it covers all.
There were 5 diffs, but I checked and removed so it ended up a complete subset.
There were 5 diffs, but I checked and removed so it ended up a complete subset.
I don't understand that bit. What diffs are you talking about?
I some annotated as TMM that did not have a prediction, but they seemed to be false positives, so I removed them yesterday. So not all the manually annotated ones have a prediction, but we checked that anyway
I've created a new file for SO annotations:
pombe-embl/supporting_files/manual_so_term_annotations.tsv
It a TSV file with these columns:
I've removed the transmembrane helix annotations from the contigs and added the TMHMM predictions to that new file.
I forgot to say that I committed these changes after the load finished. I'll check on Thursday morning.
I've saved the current TMM genes as a query so we can compare on Thursday: https://www.pombase.org/results/from/id/81197dff-b941-4c4d-86b2-e37c940319ae
This query should contain the new list on Thursday. It's currently the same as the list above: https://www.pombase.org/results/from/id/a62366ca-a920-4421-94ee-03289f360d45
I forgot to say that I committed these changes after the load finished. I'll check on Thursday morning.
Looks good to me. Anything else to do here?
Perhaps we should include the "dubious" so that the numbers align. It is possible that a few of these are real.
Most, I am pretty sure that they aren't. The are not conserved in Schizos, and the DNA sequence is usually low complexity, likt this
SPAC343.21.1 length:477 includes:exons ATGGTTAGCTACAATGTGCTAACTAAACTGTTTTTTATTTTCTCCGGAGGTTTGGTTTTTTTTTTTTTTGAATTTTTTTTAAATCATTTCAATTACTATCCTACCAAACTTTTGTATTACATTACATTCTATTTCATTAAAAATCATCCTTCTCTTTTTCTTTTATTTAATTTTTTTTTGTCCACAGCATCTTTTTCCTATTCTTTTCCCTCAAAGTCTCATTTAACTTTTTACTCAAAAGGTGCTCCCTCTGTCTTTTTTCTATCTCTCAAATCTTCCCCATGTCCCGGGTACTGCTCCTCTACTCTACTCTACTCTACCTCTAACTTGCTCCCTTCCCTCCCCTCCCCCCTCCATGCTCCTCCTCCACTCGGCTCACGTTTGCTTCACGTTTTTTTTTACCGTAGATCAAACGCATCGGCGTATCCTTCTTTTACGCCCCGCTATTCTTTTTTCCCTTCTTTCACTTTACGATGA
which tends to translate into runs of hydrophobic amino acids. Some of them have predicted runs of TMMs which are almost fully adjacent.
We we have an ‘annotation’ in the ‘domain section’ of the gene page which displays the number of transmembrane domains. This information is from a qualifier in the contig files. As you can see it's very incomplete. Suggest