Closed ValWood closed 8 months ago
@manulera @kimrutherford
Hi @ValWood sorry for the late reply.
If you follow a special naming pattern for those alleles (for instance CTD:S2
like you do in GO annotations), I could skip those from the validation so they would not be returned as an error. The problem is how these can be used by people who download the dataset, since they don't follow the pattern of everything else.
We could translate that notation CTD:S2 into the actual residues. This is better for people using the dataset, and for showing the modifications in the map. The problem is then that the display in the gene page would be too crowded. @kimrutherford we could maybe revert to the concise notation at the front-end level? This might be a pain.
I like the idea of displaying the shorthand notation and unfolding to the full residue description in datasets.
Decision, where to store the full string (at the moment we only have CTD_S2) (config file?) Also should be documented https://www.pombase.org/documentation/gene-page-modifications
Decision, where to store the full string (at the moment we only have CTD_S2) (config file?)
If we add the abbreviations and the full residue descriptions to the main config file we'll be able to write out the residues and positions in the TSV file. If it helps we could also show the residues and positions as a mouse-over for "CTD_S2" etc. on the gene pages.
I like the idea of displaying the shorthand notation and unfolding to the full residue description in datasets.
Could you let me know what the full versions are?
S2 is shorthand for 1579 1586 1593 1600 1607 1614 1621 1628 1635 1642 1649 1656 1663 1670 1677 1684 1691 1698 1705 1712 1719 1726 1733 1740 1747
T4 is shorthand for 1584 1591 1598 1605 1612 1619 1626 1615 1640 1647 1654 1661 1663 1675 1682 1689 1696 1703 1710 1717 1723 1731 1738 1745 1752
S5 is shorthand for 1582 1589 1596 1603 1610 1617 1624 1613 1638 1645 1652 1659 1666 1673 1680 1687 1694 1701 1708 1715 1722 1729 1736 1743 1750
S7 is shorthand for 1584 1591 1598 1605 1612 1619 1626 1615 1640 1647 1654 1661 1668 1675 1682 1689 1696 1703 1710 1717 1724 1731 1738 1745 1752
S2 is shorthand for ...
Thanks. Is that what should appear in the modifications file?: https://www.pombase.org/data/annotations/modifications/pombase-chado.modifications.gz
The other modifications are of the form "S225,S157,S15,S62" or "S220".
Sorry I only put the numbers. What I should have said was
CTD_S2 is shorthand for S1579,S1586,S1593,S1600,S1607,S1614,S1621,S1628,S1635,S1642,S1649, S1656,S1663,S1670,S1677,S1684,S1691,S1698,S1705,S1712,S1719,S1726,S1733,S1740,S1747
CTD_T4 is shorthand for T1584,T1591,T1598,T1605,T1612,T1619,T1626,T1615,T1640,T1647,T1654,T1661,T1663,T1675,T1682,T1689,T1696,T1703,T1710,T1717,T1723,T1731,T1738,T1745,T1752
CTD_S5 is shorthand for T1582,T1589,T1596,T1603,T1610,T1617,T1624,T1613,T1638,T1645,T1652, T1659,T1666,T1673,T1680,T1687,T1694,T1701,T1708,T1715,T1722,T1729,T1736,T1743,T1750
CTD_S7 is shorthand for S1584,S1591,S1598,S1605,S1612,S1619,S1626,S1615,S1640,S1647,S1654,S1661,S1668,S1675,S1682 S1689,S1696,S1703,S1710,S1717,S1724,S1731,S1738,S1745,S1752
The other thing we will be able to do this this detail is to display the rpb1-CTD modifications here:
ANd the alleles too, but lets get modifications first ;)
OK, thanks. I'm working on this now.
The CTD abbreviations are implemented for Friday night's load.
The configuration is here: https://github.com/pombase/pombase-config/blob/e90178606cbe95c7161e61b40db2abe9e762da2e/website/pombase_v2_config.json#L7009-L7016
"modification_abbreviations": {
"SPBC28F2.12": {
"CTD_S2": "S1579,S1586,S1593,S1600,S1607,S1614,S1621,S1628,S1635,S1642,S1649, S1656,S1663,S1670,S1677,S1684,S1691,S1698,S1705,S1712,S1719,S1726,S1733,S1740,S1747",
"CTD_T4": "T1584,T1591,T1598,T1605,T1612,T1619,T1626,T1615,T1640,T1647,T1654,T1661,T1663,T1675,T1682,T1689,T1696,T1703,T1710,T1717,T1723,T1731,T1738,T1745,T1752",
"CTD_S5": "T1582,T1589,T1596,T1603,T1610,T1617,T1624,T1613,T1638,T1645,T1652, T1659,T1666,T1673,T1680,T1687,T1694,T1701,T1708,T1715,T1722,T1729,T1736,T1743,T1750",
"CTD_S7": "S1584,S1591,S1598,S1605,S1612,S1619,S1626,S1615,S1640,S1647,S1654,S1661,S1668,S1675,S1682 S1689,S1696,S1703,S1710,S1717,S1724,S1731,S1738,S1745,S1752"
}
}
If it helps we could also show the residues and positions as a mouse-over for "CTD_S2" etc. on the gene pages.
Would this be useful?
The CTD abbreviations are implemented for Friday night's load.
These changes broke the JaponicusDB update. There are two modifications annotated for SJAG_02763 but the corresponding configuration was missing. I've added the config now so it should be OK on Monday morning and the SJAG_02763 page should have two modifications.
If it helps we could also show the residues and positions as a mouse-over for "CTD_S2" etc. on the gene pages.
Would this be useful?
We discussed this on the call - it is useful.
I've added the config now so it should be OK on Monday morning and the SJAG_02763 page should have two modifications.
Fixed!
If it helps we could also show the residues and positions as a mouse-over for "CTD_S2" etc. on the gene pages.
That's done now and will be on the main site in a little while.
nice!
The other thing we will be able to do this this detail is to display the rpb1-CTD modifications here:
This is mostly done. I'm just trying to track down a bug: the mouse-over below is missing "removed by ssu72" in the screenshot below.
I hope to have it fixed and committed in time for tonight's load.
Fixed!
Turns out there was a bug in the modification display code that's been there since I first implemented it. Some of the extension details are missing from the current feature viewer. Here's an example:
Tomorrow it will look like this:
While testing I noticed that igo1 / SPAC10F6.16 has this extension on one of the modifications: modified residue S64 level fluctuates during mitotic cell cycle
Should we add "level fluctuates during" to the list of extensions types that are shown in the mouse-over? The current list of types is: "added_during", "added_by", "affected_by", "removed_by"
This is great!
How many times have we used "fluctuates during". it's a bit "woolly"
How many times have we used "fluctuates during". it's a bit "woolly"
338 times, but only in three publications:
I removed the ones from PMID:19279143 (only once) PMID:29079657 (3 times)
I think we should disable this relationship in Canto. It isn't necessary. Could you remove it from the config?
Could you remove it from the config?
Done for the morning.
I think this is done now?
OK! great!
I fixed https://github.com/pombase/allele_qc/tree/master/results https://github.com/pombase/allele_qc/blob/master/results/protein_modification_cannot_fix_other_errors.tsv
except
SPBC28F2.12 rpb1 MOD:00696 Inferred from Direct Assay CTD_S2 added_by(PomBase:SPAC2F3.15) PMID:19328067 4896 2009-07-12 pattern_error SPBC28F2.12 rpb1 MOD:00696 Inferred from Direct Assay CTD_S2 added_by(PomBase:SPBC32H8.10) PMID:19328067 4896 2009-07-12 pattern_error SPBC28F2.12 rpb1 MOD:00696 Inferred from Direct Assay CTD_S5 added_by(PomBase:SPBC19F8.07) PMID:19328067 4896 2009-07-12 pattern_error SPBC28F2.12 rpb1 MOD:00696 Inferred from Direct Assay CTD_S5 added_by(PomBase:SPBC32H8.10) PMID:19328067 4896 2009-07-12 pattern_error SPBC28F2.12 rpb1 MOD:00046 Inferred from Direct Assay CTD_S5 added_by(PomBase:SPAC24B11.06c) PMID:33410907 4896 2021-01-12 pattern_error SPBC28F2.12 rpb1 MOD:00046 Inferred from Direct Assay CTD_S5 removed_by(PomBase:SPAC3G9.04) PMID:33410907 4896 2021-01-22 pattern_error
These are a special case. I wonder if we can a) accept these without an error b) interpret this shorthand for display purposes as follows:
S2 is shorthand for 1579 1586 1593 1600 1607 1614 1621 1628 1635 1642 1649 1656 1663 1670 1677 1684 1691 1698 1705 1712 1719 1726 1733 1740 1747
S5 is shorthand for 1582 1589 1596 1603 1610 1617 1624 1613 1638 1645 1652 1659 1666 1673 1680 1687 1694 1701 1708 1715 1722 1729 1736 1743 1750
(S for each)