pombase / pombase-chado

PomBase code for accessing Chado
MIT License
5 stars 3 forks source link

protein modification errors #1141

Closed ValWood closed 8 months ago

ValWood commented 9 months ago

I fixed https://github.com/pombase/allele_qc/tree/master/results https://github.com/pombase/allele_qc/blob/master/results/protein_modification_cannot_fix_other_errors.tsv

except

SPBC28F2.12 rpb1 MOD:00696 Inferred from Direct Assay CTD_S2 added_by(PomBase:SPAC2F3.15) PMID:19328067 4896 2009-07-12 pattern_error SPBC28F2.12 rpb1 MOD:00696 Inferred from Direct Assay CTD_S2 added_by(PomBase:SPBC32H8.10) PMID:19328067 4896 2009-07-12 pattern_error SPBC28F2.12 rpb1 MOD:00696 Inferred from Direct Assay CTD_S5 added_by(PomBase:SPBC19F8.07) PMID:19328067 4896 2009-07-12 pattern_error SPBC28F2.12 rpb1 MOD:00696 Inferred from Direct Assay CTD_S5 added_by(PomBase:SPBC32H8.10) PMID:19328067 4896 2009-07-12 pattern_error SPBC28F2.12 rpb1 MOD:00046 Inferred from Direct Assay CTD_S5 added_by(PomBase:SPAC24B11.06c) PMID:33410907 4896 2021-01-12 pattern_error SPBC28F2.12 rpb1 MOD:00046 Inferred from Direct Assay CTD_S5 removed_by(PomBase:SPAC3G9.04) PMID:33410907 4896 2021-01-22 pattern_error

These are a special case. I wonder if we can a) accept these without an error b) interpret this shorthand for display purposes as follows:

S2 is shorthand for 1579 1586 1593 1600 1607 1614 1621 1628 1635 1642 1649 1656 1663 1670 1677 1684 1691 1698 1705 1712 1719 1726 1733 1740 1747

S5 is shorthand for 1582 1589 1596 1603 1610 1617 1624 1613 1638 1645 1652 1659 1666 1673 1680 1687 1694 1701 1708 1715 1722 1729 1736 1743 1750

(S for each)

ValWood commented 9 months ago

@manulera @kimrutherford

manulera commented 9 months ago

Hi @ValWood sorry for the late reply.

Option 1

If you follow a special naming pattern for those alleles (for instance CTD:S2 like you do in GO annotations), I could skip those from the validation so they would not be returned as an error. The problem is how these can be used by people who download the dataset, since they don't follow the pattern of everything else.

Option 2

We could translate that notation CTD:S2 into the actual residues. This is better for people using the dataset, and for showing the modifications in the map. The problem is then that the display in the gene page would be too crowded. @kimrutherford we could maybe revert to the concise notation at the front-end level? This might be a pain.

ValWood commented 9 months ago

I like the idea of displaying the shorthand notation and unfolding to the full residue description in datasets.

ValWood commented 9 months ago

Decision, where to store the full string (at the moment we only have CTD_S2) (config file?) Also should be documented https://www.pombase.org/documentation/gene-page-modifications

kimrutherford commented 9 months ago

Decision, where to store the full string (at the moment we only have CTD_S2) (config file?)

If we add the abbreviations and the full residue descriptions to the main config file we'll be able to write out the residues and positions in the TSV file. If it helps we could also show the residues and positions as a mouse-over for "CTD_S2" etc. on the gene pages.

kimrutherford commented 9 months ago

I like the idea of displaying the shorthand notation and unfolding to the full residue description in datasets.

Could you let me know what the full versions are?

ValWood commented 9 months ago

S2 is shorthand for 1579 1586 1593 1600 1607 1614 1621 1628 1635 1642 1649 1656 1663 1670 1677 1684 1691 1698 1705 1712 1719 1726 1733 1740 1747

T4 is shorthand for 1584 1591 1598 1605 1612 1619 1626 1615 1640 1647 1654 1661 1663 1675 1682 1689 1696 1703 1710 1717 1723 1731 1738 1745 1752

S5 is shorthand for 1582 1589 1596 1603 1610 1617 1624 1613 1638 1645 1652 1659 1666 1673 1680 1687 1694 1701 1708 1715 1722 1729 1736 1743 1750

S7 is shorthand for 1584 1591 1598 1605 1612 1619 1626 1615 1640 1647 1654 1661 1668 1675 1682 1689 1696 1703 1710 1717 1724 1731 1738 1745 1752

kimrutherford commented 9 months ago

S2 is shorthand for ...

Thanks. Is that what should appear in the modifications file?: https://www.pombase.org/data/annotations/modifications/pombase-chado.modifications.gz

The other modifications are of the form "S225,S157,S15,S62" or "S220".

ValWood commented 9 months ago

Sorry I only put the numbers. What I should have said was

CTD_S2 is shorthand for S1579,S1586,S1593,S1600,S1607,S1614,S1621,S1628,S1635,S1642,S1649, S1656,S1663,S1670,S1677,S1684,S1691,S1698,S1705,S1712,S1719,S1726,S1733,S1740,S1747

CTD_T4 is shorthand for T1584,T1591,T1598,T1605,T1612,T1619,T1626,T1615,T1640,T1647,T1654,T1661,T1663,T1675,T1682,T1689,T1696,T1703,T1710,T1717,T1723,T1731,T1738,T1745,T1752

CTD_S5 is shorthand for T1582,T1589,T1596,T1603,T1610,T1617,T1624,T1613,T1638,T1645,T1652, T1659,T1666,T1673,T1680,T1687,T1694,T1701,T1708,T1715,T1722,T1729,T1736,T1743,T1750

CTD_S7 is shorthand for S1584,S1591,S1598,S1605,S1612,S1619,S1626,S1615,S1640,S1647,S1654,S1661,S1668,S1675,S1682 S1689,S1696,S1703,S1710,S1717,S1724,S1731,S1738,S1745,S1752

The other thing we will be able to do this this detail is to display the rpb1-CTD modifications here:

Screenshot 2024-02-29 at 08 06 36
ValWood commented 9 months ago

ANd the alleles too, but lets get modifications first ;)

kimrutherford commented 9 months ago

OK, thanks. I'm working on this now.

kimrutherford commented 9 months ago

The CTD abbreviations are implemented for Friday night's load.

The configuration is here: https://github.com/pombase/pombase-config/blob/e90178606cbe95c7161e61b40db2abe9e762da2e/website/pombase_v2_config.json#L7009-L7016

"modification_abbreviations": {
   "SPBC28F2.12": {
      "CTD_S2": "S1579,S1586,S1593,S1600,S1607,S1614,S1621,S1628,S1635,S1642,S1649, S1656,S1663,S1670,S1677,S1684,S1691,S1698,S1705,S1712,S1719,S1726,S1733,S1740,S1747",
      "CTD_T4": "T1584,T1591,T1598,T1605,T1612,T1619,T1626,T1615,T1640,T1647,T1654,T1661,T1663,T1675,T1682,T1689,T1696,T1703,T1710,T1717,T1723,T1731,T1738,T1745,T1752",
      "CTD_S5": "T1582,T1589,T1596,T1603,T1610,T1617,T1624,T1613,T1638,T1645,T1652, T1659,T1666,T1673,T1680,T1687,T1694,T1701,T1708,T1715,T1722,T1729,T1736,T1743,T1750",
      "CTD_S7": "S1584,S1591,S1598,S1605,S1612,S1619,S1626,S1615,S1640,S1647,S1654,S1661,S1668,S1675,S1682 S1689,S1696,S1703,S1710,S1717,S1724,S1731,S1738,S1745,S1752"
   }
}
kimrutherford commented 9 months ago

If it helps we could also show the residues and positions as a mouse-over for "CTD_S2" etc. on the gene pages.

Would this be useful?

kimrutherford commented 9 months ago

The CTD abbreviations are implemented for Friday night's load.

These changes broke the JaponicusDB update. There are two modifications annotated for SJAG_02763 but the corresponding configuration was missing. I've added the config now so it should be OK on Monday morning and the SJAG_02763 page should have two modifications.

kimrutherford commented 9 months ago

If it helps we could also show the residues and positions as a mouse-over for "CTD_S2" etc. on the gene pages.

Would this be useful?

We discussed this on the call - it is useful.

kimrutherford commented 9 months ago

I've added the config now so it should be OK on Monday morning and the SJAG_02763 page should have two modifications.

Fixed!

image

kimrutherford commented 9 months ago

If it helps we could also show the residues and positions as a mouse-over for "CTD_S2" etc. on the gene pages.

That's done now and will be on the main site in a little while.

image

ValWood commented 9 months ago

nice!

kimrutherford commented 9 months ago

The other thing we will be able to do this this detail is to display the rpb1-CTD modifications here:

This is mostly done. I'm just trying to track down a bug: the mouse-over below is missing "removed by ssu72" in the screenshot below.

I hope to have it fixed and committed in time for tonight's load.

https://desktop.kmr.nz/gene_protein_features/SPBC28F2.12

image

kimrutherford commented 9 months ago

Fixed!

image

Turns out there was a bug in the modification display code that's been there since I first implemented it. Some of the extension details are missing from the current feature viewer. Here's an example:

image

Tomorrow it will look like this:

image

kimrutherford commented 9 months ago

While testing I noticed that igo1 / SPAC10F6.16 has this extension on one of the modifications: modified residue S64 level fluctuates during mitotic cell cycle

Should we add "level fluctuates during" to the list of extensions types that are shown in the mouse-over? The current list of types is: "added_during", "added_by", "affected_by", "removed_by"

ValWood commented 9 months ago

This is great!

How many times have we used "fluctuates during". it's a bit "woolly"

kimrutherford commented 9 months ago

How many times have we used "fluctuates during". it's a bit "woolly"

338 times, but only in three publications:

ValWood commented 9 months ago

I removed the ones from PMID:19279143 (only once) PMID:29079657 (3 times)

I think we should disable this relationship in Canto. It isn't necessary. Could you remove it from the config?

kimrutherford commented 9 months ago

Could you remove it from the config?

Done for the morning.

kimrutherford commented 8 months ago

I think this is done now?

ValWood commented 8 months ago

OK! great!