Open ValWood opened 2 years ago
This is another example P-factorDeltaLeu,SPCC1795.06:allele-2,terminal leu residue deleted,{4acb2f022c79e2b7} the pre protein is 201 aa but I think the processed pheromone is only around 10-20
Possibly
We need to chat about this.... @manulera @kimrutherford
Do the different protein products have different functions?
So we have 2 situations. One where the protein are fused and (I assume, usually ) post-translational cleaved. We have a number of these https://www.pombase.org/term_genes/PBO:0000229 and I should add nup189 to this list. These proteins do different things. Examples include a cohesin subunit fuse to a DNA polymerase epsilon mitotic cohesin N-acetyltransferase/DNA polymerase eta Eso1 fusion protein
For the alternative isoforms the protein are often the same, or similar, and even identical but expressed at different times (for example).
I think what we plan to do is largely OK, but we need to think about alleles. Also how to make it clearer which GO annotations belong to which forms of the fusion proteins (which are not currently classed as alternative forms, but I am thinking that the way we model them should be similar).
The main problem is we can't even get PRO- forms for most of them because we don't know the coordinates of the sub-parts. This is one of the reasons I have been parking this. But if we could get one example working that would give us a template to work with.
I don't think we address this case in the end, given that it is a bit of an edge case. Happy to join a call to discuss this eventually.
nup189-deltaC,SPAC1486.05:allele-4,C-terminal deleted,{6ad90d4a01aee5f2} nup189-deltaN,SPAC1486.05:allele-3,N-terminal deleted,{6ad90d4a01aee5f2} nup98-tail(uncleavable),SPAC1486.05:allele-6,Nup98 portion of Nup189,plus 33aa tail,with S965A mutation,{8b8d3cc2e5f5c371}
'An alternative method was used for spNup189n and spNup189c because they are encoded by a single nup189+ gene transcribed as one transcript (HA and TH, unpublished result) and consequently endogenous nup189+ gene deletion results in deletion of both spNup189c and spNup189n. Therefore, nup189n was integrated into the lys1+ locus when analyzing spNup189c and nup189c was integrated into the aur1+ locus when analyzing spNup189n (see Materials and Methods for details). '
This is a special case because a single gene (nup189) codes for 2 nucleoporins (nup96 and nup98)
Ideally, we need to represent these as 2 separate proteins. To do this I need the coordinates of both ( it must be post-translationally cleaved, and seems to be the same in S. cerevisiae and human) This would then need a different identifier for each (how to do this, in this case because it is post processing, should we use PR IDS for the proteoforms)? What do others do?
Once this is done we can create the correct alleles (which is a deletion of the whole with the non-deleted gene reintegrated)
NOte: we might need new syntax to differentiate 2 different proteins from the same gene from isoforms? (maybe this doesn't matter as long as the forms are correctly described and the annotations are made correctly). The difference is that for the alternative forms we only want to represent one instance e in the protein count, but in the tandem fusion cases they are 2 completely independent proteins.