pombase / website

PomBase website v2
MIT License
6 stars 1 forks source link

GPI residues #2189

Closed ValWood closed 1 month ago

ValWood commented 3 months ago

We have https://www.pombase.org/term/MOD:00818

I wanted to check this using https://services.healthtech.dtu.dk/services/NetGPI-1.1/

but I haven't been able to run the search to completion. Probably need to do smaller chunks.

I also found https://mendel.imp.ac.at/gpi/gpi_server.html

We only need to run these once because they are not going to change.

I want to put them as "target proteins" in a GPI model

kimrutherford commented 3 months ago

I wanted to check this using https://www.pombase.org/term/MOD:00818 but I haven't been able to run the search to completion. Probably need to do smaller chunks.

Is that the wrong link?

ValWood commented 3 months ago

yep, fixed

PCarme commented 3 months ago

I ran the list of proteins through this https://services.healthtech.dtu.dk/services/NetGPI-1.1/ and obtained these results output_protein_type.txt. So out of the 36 proteins in the list, 8 are predicted as not GPI-anchored : mug191, cbm1, dfg501, pho1, mde5, eng1, gas2 and SPAC212.08c (although the product of this last gene is "S. pombe specific GPI anchored protein family 1" in PomBase, so if it is a pombe specific family, maybe the tool is unsuitable to predict it ?)

ValWood commented 3 months ago

we could try this too https://mendel.imp.ac.at/gpi/gpi_server.html I am guessing there will be often be false negatives ...

ValWood commented 3 months ago

It would be good to run all of these through https://www.pombase.org/results/from/id/561e4908-db55-4c8b-b5f1-1d6a65c41581 to check that we are not missing any

ValWood commented 3 months ago

In the end, I think we can remove the unsupported ones (we might keep the ones predicted by Groot). Those which are ISS with a PomBase ref we can delete (in some cases it is possible that I was getting confused between a GPI anchor and N-glycosylation!)

Once we decide which to include (we can include both methods) we can add to the modification file pombe-embl/supporting_files/legacy_modifications_from_contigs.tsv with the publication for the method as the reference.

PCarme commented 3 months ago

Taking the list of 36 genes through this tool https://mendel.imp.ac.at/gpi/gpi_server.html gave different predictions compared to the first one (the second seems more stringent, returning more negatives). Here is a table recapitulating the predictions with both tools (for the second one I tried both Metazoa and Protozoa predictors, since pombe is neither of these) and the supporting evidence for the annotation on PomBase. GPIanchored_genes.txt

kimrutherford commented 2 months ago

I wanted to check this using https://services.healthtech.dtu.dk/services/NetGPI-1.1/

I tried submitting 500 proteins at once to see if we could submit all the proteins in batches of that size. It timed-out after 10 hours so I don't think that's going to work.

If we could download the software we could run it in the background but I couldn't find a way to do that.

ValWood commented 2 months ago

OK, this is a bit of a pain.

We could try to run this reduced subset through both https://www.pombase.org/results/from/id/561e4908-db55-4c8b-b5f1-1d6a65c41581

If this doesn't work let's just use the ones we already had annotated that are supported by Groot or confirmed by either server

if we use NetGPI https://www.sciencedirect.com/science/article/pii/S2590262821000010?via%3Dihub it isn't PubMed indexed, so we would still need to write a PomBAse reference linking to this paper?

PCarme commented 2 months ago

Well, I got it to work with the whole 1259 set of proteins using the short output option, here is the result file. output_protein_type-2.txt And here is a file recapitulating the results on this protein set with the 2 tools. GPIanchored_full_list.txt

PCarme commented 2 months ago

And here is the same file as a table with the proteins predicted as GPI-anchored highlighted GPIanchored_full_list.xls

ValWood commented 2 months ago

Great, quite a mixture of results. We can include all the ones from all. sources. Then we will look more closely at the ones only from the first server (they may be correct since the other server does not appear to be trained on yeast). We might be able to establish if they are likely to be true...some seem a little strange but not impossible. I am leaning towards including all the positive ones. We can discuss when we next chat...

ValWood commented 2 months ago
  1. Because this publication method not have a PMID you need to create a PomBase reference to point to this paper/method This is under svn in Repos/pombe-embl/supporting_files/PB_references.txt

  2. Create a modification style file from this list: https://www.pombase.org/results/from/id/5ea7392e-283e-49db-b974-c86e84bc4450 plus yam8 (I updated the product of this to "regulatory subunit" (it's associated with cch1 channel, and clearly has extracellular domains from AlphaFold structure. (I don't see any of the others likely being GPI)

    I think it is better to create a new file and check into the directory: pombe-embl/external_data/modification_files/modification_files/

The file format is documented here: https://www.pombase.org/documentation/modification-data-bulk-upload-format

Just shout know if you have any questions about svn or anything

PCarme commented 2 months ago

It all seems pretty clear to me. I'm just wondering about the Evidence code to use in the modification file. I guess it should be ECO:0008024 neural network method evidence used in manual assertion ?

ValWood commented 2 months ago

That sounds good to me!

PCarme commented 2 months ago

Okay then, here is the modification file if you want to have a look at it PB_REF_0000007_modifications.tsv.txt I added a '.txt' extension since Github wouldn't let me send it with the '.tsv'

ValWood commented 2 months ago

Looks good. v

PCarme commented 2 months ago

OK, I added it to the modification files on SVN

kimrutherford commented 2 months ago

Thanks Pascal.

I've added an empty Residue and and empty Extension column to the new file because they are needed by the modification file parser.

I've also fixed the load script because it was only looking for modification data file names that started with "PMID_". It now looks for any file in the pombe-embl/external_data/modification_files/ directory that ends with a ".tsv"

I didn't make these changes in time for the Thursday night load so we'll need to check on Friday morning.

PCarme commented 2 months ago

Alright, thanks Kim ! For the Residue and Extension columns, I didn't include those since they were marked as not mandatory in the documentation page. It might be worth making it clearer that all columns are required, even the ones that are not mandatory to fill ?

kimrutherford commented 2 months ago

Good point. I've made the documentation more clear. The change will be on the main site in the morning.

PCarme commented 1 month ago

I had forgotten to go check this one, it seems all good to me https://www.pombase.org/reference/PB_REF:0000007. Should we close this issue ?

ValWood commented 1 month ago

I think do, any other issues can go in a new ticket.