Closed ValWood closed 5 days ago
Which genes are signal peptides?
Actually, I thought they came through the InterpPro pipeline, but now I look they don't.
It seems that we must've run the SignalP at some point, and then assigned to so terms to the proteins with signal peptides.
https://www.pombase.org/term/SO:0000418
Are there any signal peptides in the IP-scan file that we ignore? If so that might be a better way to do it as we could get the features on the protein display
At least here they used to https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1160203/
I haven't been able to find anything in the XML yet.
I've noticed that some InterPro pombe pages have a feature for "phobius: SIGNAL_PEPTIDE": https://www.ebi.ac.uk/interpro/protein/UniProt/O13640/
I can't see anything about phobius in the XML.
OK, do we get the TMM domains from InterPro, I can't remember
OK, do we get the TMM domains from InterPro, I can't remember
We run TMHMM when we process the InterPro XML file to make a file with domains for the load. Unfortunately InterPro don't provide the TMMs.
Regarding, TMM and SignalP, this is already supported. However, the software/models required cannot be distributed with InterProScan because they contain licensed components. The InterProScan documentation include instructions for activating these analyses: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Finterproscan-docs.readthedocs.io%2Fen%2Flatest%2FActivatingLicensedAnalyses.html&data=05%7C01%7Cvw253%40universityofcambridgecloud.onmicrosoft.com%7C47516055ef644297c98608dbef363407%7C49a50445bdfa4b79ade3547b4f3986e9%7C1%7C0%7C638366788297525291%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=v%2BheQWm807IUgX7mhuARS2ikaJVxlnHeiZ3TRYtl2Do%3D&reserved=0. The InterProScan web service provide TMM/SignalP predictions, e.g. https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ebi.ac.uk%2Finterpro%2Fresult%2FInterProScan%2Fiprscan5-R20231127-103348-0178-16749718-p1m%2F&data=05%7C01%7Cvw253%40universityofcambridgecloud.onmicrosoft.com%7C47516055ef644297c98608dbef363407%7C49a50445bdfa4b79ade3547b4f3986e9%7C1%7C0%7C638366788297525291%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=PjZR7E4GDdXyYwNksomYzSl7qnu2QWgpCL%2BKgi9i7nY%3D&reserved=0
I made this low priority, but if it is quick (i.e largely running pipeline and co nfigureation) it can be re-prioritized
Do we just need a "Commonly used query" that returns these genes?: https://www.pombase.org/term/SO:0000418
We could but my only worry is that it isn't comprehensive. I must of added these Ad Hoc as I saw them referred to, or in a protein feature model.
However, the software/models required cannot be distributed with InterProScan because they contain licensed components.
I hate that. Such an unnecessary pain for users. It tends to be older tools that do that. Tool authors seem to be more sensible these days.
The InterProScan web service provide TMM/SignalP predictions, e.g.
The search results page includes a protein feature diagram with the signal peptides marked, but there's no easy way to download that data.
I've had a look at installing and running SignalP-6.0. That doesn't look easy on oliver1 because the version of the operating system is very old. I can have a go though. The other problem is that we'd need to interpret the output. It gives a score rather than a yes/no answer for each protein.
# SignalP-6.0 Organism: Other Timestamp: 20240724114903
# ID Prediction OTHER SP(Sec/SPI) LIPO(Sec/SPII) TAT(Tat/SPI) TATLIPO(Tat/SPII) PILIN(Sec/SPIII) CS Position
SPCC757.12.1 length_625 SP 0.000225 0.999116 0.000156 0.000169 0.000148 0.000141 CS pos: 22-23. Pr: 0.9807
I've also looked at installing and running Phobius but the instructions are very minimal and couldn't work it out from a quick attempt.
It looks like we can get the signal peptide details from the UniProt API with a URL like:
https://rest.uniprot.org/uniprotkb/stream?compressed=true&fields=accession,id,ft_signal&format=tsv&query=((accession:O74922)+OR+(accession:O94565))
which returns a very helpful TSV results like this:
Entry Entry Name Signal peptide
O74922 AMY1_SCHPO SIGNAL 1..22; /evidence="ECO:0000255"
O94565 OMH4_SCHPO
O13770 YE98_SCHPO SIGNAL 1..20; /evidence="ECO:0000255"
...
I just tried submitting 1000 accessions at once, which worked. More than that didn't work but that still means we only need 5-10 API calls.
If this method sounds OK, I can download all the data we need tomorrow for loading on Thursday night.
Yes go ahead Do we have a reference to use for the method (algorithm) The ECO is match to sequence model evidence used in manual assertion
Maybe they manually review them, in UniPRot it says Manual assertion according to sequence analysis so maybe we should create a specific reference to say exactly where they are from.
@Antonialock How do you call signal peptides?
Also can we get cleavage sites @kimrutherford ?
We run a pipeline for prediction of various sequence features, this rule is for signal peptides: https://fisheye.sib.swiss/browse/~raw,r=HEAD/SIB/unirules/anarules/ANA00006.uru
Sequence features (metal or substrate binding sites, TM domains, signal peptides...) may be added to or modified based on experimental data as part of manual curation. it looks like only 75 entries have papers associated with the signal peptide feature (none for pombe): https://www.uniprot.org/uniprotkb?query=%28scope%3A%22signal+peptide%22%29
there should be 52 proteins with annotated propeptide sequences in pombe
https://www.uniprot.org/uniprotkb?query=%28ft_propep%3A*%29+AND+%28taxonomy_id%3A4896%29
there should be 52 proteins with annotated propeptide sequences in pombe
Hi Antonia!
I got 214 proteins by using the Proteins with: Signal peptide filter in the left hand column?
https://www.pombase.org/results/from/id/cf1af35a-18c5-4d42-9024-16f3e0409433
Also can we get cleavage sites @kimrutherford ?
Hi Val. I couldn't find cleavage sites on the results page. Is there a synonym for cleavage site could look for?
I got 214 proteins by using the Proteins with: Signal peptide filter in the left hand column?
I was expecting a better overlap between the genes currently annotated with SO:0000418 and the list from UniProt. This makes me a bit suspicious:
https://www.pombase.org/results/from/id/c0918f7f-d9a2-40b9-9fb6-45f4c3305c4a https://www.pombase.org/results/from/id/cf1af35a-18c5-4d42-9024-16f3e0409433 https://www.pombase.org/results/from/id/cb5b1538-adb6-443d-a42c-680b8e4c565d
We have some genes annotated with "signal_anchor" which is_a
signal_peptide. The list from UniProt doesn't include any of them.
We have some genes annotated with "signal_anchor" which is_a signal_peptide. The list from UniProt doesn't include any of them.
I guess these are harder to locate if they don't include a cleavage site...
I was expecting a better overlap between the genes currently annotated with SO:0000418 and the list from UniProt. This makes me a bit suspicious:
I think none of the methods are optimal. All of the current annotations from both look OK to me (or at least probable).
I expect most ER/Golgi/cell surface and most membrane transporters will have a signal peptide and we are nowhere near that. So there are few false positives but a lot of false negatives.
~The obvious true FP that I see is https://www.pombase.org/gene/SPAC3A11.03 (elongation factor 3) Where are these? I will delete it.~ deleted
there should be 52 proteins with annotated propeptide sequences in pombe
Hi Antonia!
I got 214 proteins by using the Proteins with: Signal peptide filter in the left hand column?
https://www.pombase.org/results/from/id/cf1af35a-18c5-4d42-9024-16f3e0409433
Yes propeptide is a different filter from signal peptide. I thought Val might have meant propeptide when she asked for cleavage sites. A propeptide is an extra bit of peptide that is cleaved off the protein as part of maturation.
I think none of the methods are optimal. All of the current annotations from both look OK to me (or at least probable).
So do you think we should add the UniProt list to our existing signal peptide annotations? That would give 357 annotations.
So do you think we should add the UniProt list to our existing signal peptide annotations? That would give 357 annotations.
The decision is to add the UniProt annotations and remove any existing annotation that are covered by UniProt.
The decision is to add the UniProt annotations and remove any existing annotation that are covered by UniProt.
That's done now and checked in, but not in time for the load. I'll check on Thursday morning. I have a test load on my desktop: https://desktop.kmr.nz/term/SO:0000418
I've moved the existing manual annotations out of the contig files and into pombe-embl/supporting_files/manual_so_term_annotations.tsv
.
I've left the signal_anchor
annotations in the contig files for now: https://desktop.kmr.nz/term/SO:0001809
Next step: the UniProt data file now gets processed so we can show the signal peptides in the feature viewer:
All the code and script changes are in place now to process the other columns with coordinates from the data file if they are useful:
I'll check load on Thursday morning to make sure the signal peptides are displayed correctly.
We should do these too Transit peptide (I don't know how a transit peptide differ from a signal peptide !) Binding site Active site Modified residue (wes but we will need to map to PRO an filter redundancy for experimental ones)
I never knew this!
Transit peptides and signal peptides are both short amino acid sequences that direct the transport of proteins to specific locations within a cell, but they have different roles and target different cellular destinations:
Transit Peptides:
Function: Transit peptides direct proteins to organelles within the cell, such as mitochondria or chloroplasts. Location of Target: These peptides typically target intracellular organelles. Example: A protein destined for the mitochondria will have a mitochondrial transit peptide that directs it to the mitochondrion. Similarly, a protein bound for the chloroplast will have a chloroplast transit peptide. Cleavage: After the protein reaches its destination (e.g., mitochondria or chloroplast), the transit peptide is usually cleaved off by specific peptidases. Signal Peptides:
Function: Signal peptides direct the nascent protein to the secretory pathway, which includes the endoplasmic reticulum (ER) and, eventually, the extracellular space or plasma membrane. Location of Target: These peptides target the ER for proteins that are secreted from the cell, inserted into the plasma membrane, or directed to lysosomes. Example: A protein destined for secretion outside the cell will have an ER signal peptide that directs it to the ER. Cleavage: The signal peptide is typically cleaved off once the protein enters the ER lumen by signal peptidase enzymes. Summary of Key Differences: Target Destination:
Transit Peptides: Direct proteins to mitochondria or chloroplasts. Signal Peptides: Direct proteins to the endoplasmic reticulum and the secretory pathway. Function:
Transit Peptides: Ensure proteins are correctly localized within specific organelles. Signal Peptides: Ensure proteins are processed through the secretory pathway and directed either to the cell membrane, outside the cell, or to lysosomes. Cleavage:
Both types of peptides are typically cleaved off once the protein reaches its destination.
It sounds like it makes sense to add SO:0000725 annotations for the transit peptides, as well as showing them in the feature viewer?
There are 267 genes with transit peptides in the UniProt data, compared to 214 signal peptides.
The signal peptide annotation is updated: https://www.pombase.org/term/SO:0000418 (357) but I forgot to check in the config change for the protein viewer. Whoops. So the signal peptide don't appear there yet. They will tomorrow.
I'll add the transit peptides today since most of the work is done.
Yes, this will pick up a lot of the mitochondrial ones hopefully!
There are 267 genes with transit peptides in the UniProt data
I'll only just noticed that there are quite a few genes (78 of 267) where the transit peptide is annotated but the location isn't fully specified (so 1..?
instead of something like 1..26
). In those cases we'll be able to add a SO:0000725 annotation but the transit peptide won't appear in the protein feature viewer.
Yep that's OK, we will be able to see which ones have no coordinates. Hopefully the coordinates will eventually get picked up by another method and we can suppress these. It's useful to know that there is one.
The transit peptides are now added as annotations in pombe-embl/supporting_files/manual_so_term_annotations.tsv
and added to the protein feature viewer for tomorrow.
fab! will they also display in the protein domains and properties section of the gene pages, like the sisnal peptides?
will they also display in the protein domains and properties section of the gene pages, like the sisnal peptides?
Yep! Here's an example from my desktop:
https://desktop.kmr.nz/gene/SPAC12G12.04
I need to add some configuration for the new ECO evidence codes from UniProt.
Out of curiosity, I ran the SignalP v.6 software on all the proteins annotated to ER, Golgi and plasma membrane on PomBase. The majority of the positives I got are already annotated on PomBase, but I still got a few predictions to protein that are not covered yet on PomBase. Here is the list of these genes : New_signal_peptides_SignalP.txt And here is the complete predictions from the software : prediction_results-2.txt
Excellent, we should add those. Could you run on the remainder to see if anything gets picked up that was excluded from the query (i.e unknowns)
I can, the software actually runs surprisingly quickly !
I got even fewer hits that are not already annotated on PomBase with this list of genes. A couple of them are on unknowns, which might be useful. Here is the list : New_signal_peptides_SignalP6-2.txt And the full prediction file : prediction_results-3.txt
Thanks Pascal. That's great.
I'll add the new predictions on Monday.
I'll add the new predictions on Monday.
I've done that. They'll be on pombase.org on Tuesday.
I added them here: pombe-embl/supporting_files/manual_so_term_annotations.tsv
Thanks Kim !
UniProt docs on signal peptides: https://www.uniprot.org/help/signal
Can we close this?
Add a search to retreive all signal peptides. None urgent