pombase / website

PomBase website v2
MIT License
6 stars 1 forks source link

TM domains on gene pages #2190

Open ValWood opened 1 week ago

ValWood commented 1 week ago

We we have an ‘annotation’ in the ‘domain section’ of the gene page which displays the number of transmembrane domains. This information is from a qualifier in the contig files. As you can see it's very incomplete. Suggest

  1. Remove these annotations from the contig files
  2. Add a similar annotation using the results of the query (it would be good to be able to access all TM domains from the gene pages and send then to the query builder, but the correct set!)
  3. Screenshot 2024-06-30 at 07 07 09
ValWood commented 1 week ago

Add discuss to this if it doesn't make sense. I think it will be quick. I'm worried that people are using this link in queries which would be totally wrong...

kimrutherford commented 1 week ago

Just to double check: is transmembrane_helix (SO:0001812) the term to use for TM domains?

ValWood commented 1 week ago

Just to double check: is transmembrane_helix (SO:0001812) the term to use for TM domains?

Yep.

kimrutherford commented 1 week ago

Thanks. I'm going to create SO:0001812 annotations in Chado for the 912 genes with TM domains. Should with make a new PB_REF to associate with the annotations? Or we could just use the PMID of the TMHMM paper.

ValWood commented 1 week ago

Lets use PMID of the TMHMM paper since it covers all.

There were 5 diffs, but I checked and removed so it ended up a complete subset.

kimrutherford commented 1 week ago

There were 5 diffs, but I checked and removed so it ended up a complete subset.

I don't understand that bit. What diffs are you talking about?

ValWood commented 1 week ago

I some annotated as TMM that did not have a prediction, but they seemed to be false positives, so I removed them yesterday. So not all the manually annotated ones have a prediction, but we checked that anyway

kimrutherford commented 1 week ago

I've created a new file for SO annotations:

pombe-embl/supporting_files/manual_so_term_annotations.tsv

It a TSV file with these columns:

I've removed the transmembrane helix annotations from the contigs and added the TMHMM predictions to that new file.

kimrutherford commented 1 week ago

I forgot to say that I committed these changes after the load finished. I'll check on Thursday morning.

I've saved the current TMM genes as a query so we can compare on Thursday: https://www.pombase.org/results/from/id/81197dff-b941-4c4d-86b2-e37c940319ae

This query should contain the new list on Thursday. It's currently the same as the list above: https://www.pombase.org/results/from/id/a62366ca-a920-4421-94ee-03289f360d45

kimrutherford commented 1 week ago

I forgot to say that I committed these changes after the load finished. I'll check on Thursday morning.

Looks good to me. Anything else to do here?

ValWood commented 1 week ago

Perhaps we should include the "dubious" so that the numbers align. It is possible that a few of these are real.

Most, I am pretty sure that they aren't. The are not conserved in Schizos, and the DNA sequence is usually low complexity, likt this

SPAC343.21.1 length:477 includes:exons ATGGTTAGCTACAATGTGCTAACTAAACTGTTTTTTATTTTCTCCGGAGGTTTGGTTTTTTTTTTTTTTGAATTTTTTTTAAATCATTTCAATTACTATCCTACCAAACTTTTGTATTACATTACATTCTATTTCATTAAAAATCATCCTTCTCTTTTTCTTTTATTTAATTTTTTTTTGTCCACAGCATCTTTTTCCTATTCTTTTCCCTCAAAGTCTCATTTAACTTTTTACTCAAAAGGTGCTCCCTCTGTCTTTTTTCTATCTCTCAAATCTTCCCCATGTCCCGGGTACTGCTCCTCTACTCTACTCTACTCTACCTCTAACTTGCTCCCTTCCCTCCCCTCCCCCCTCCATGCTCCTCCTCCACTCGGCTCACGTTTGCTTCACGTTTTTTTTTACCGTAGATCAAACGCATCGGCGTATCCTTCTTTTACGCCCCGCTATTCTTTTTTCCCTTCTTTCACTTTACGATGA

which tends to translate into runs of hydrophobic amino acids. Some of them have predicted runs of TMMs which are almost fully adjacent.