Closed ValWood closed 3 years ago
It will be better to do use PomBase S. cerevisiae orthologs as these will give much more coverage.
For example compare misses this one https://github.com/japonicusdb/japonicus-config/issues/19 which makes it look like it has a human ortholog but is lost from S. cerevsiae.
We should use PomaBase by default (citing PomBase paper) and Compara which will pull in the orthologs for those things that are lost from pombe but present in japonicus and S. cerevisiae.
citing PomBase paper
The 2018 paper?
citing PomBase paper
Should we use the same citation for the human orthologs in JaponicusDB?
Done! I think.
There are now a bunch of orthologs for cerevisiae that come via PomBase: https://www.japonicusdb.org/reference/PMID:30321395
There are some oddities. For example usp105(SJAG_00058) has:
The log file for the Compara cerevisiae loading warns about every ortholog that was already added to Chado via PomBase orthologs: https://curation.pombase.org/japonicus_nightly/latest_build/logs/log.2021-08-15-04-25-03.compara_cerevisiae_orthologs maybe that's not useful?
A list of all the "odd cases" like usp105 would be useful. I could do a check to make sure we have them correctly. In this case there are 2 paralogs in S. cerevisiae and I missed PRP42 as it is more divergent.
The best publication to cite for orthologs really is "Schizosaccharomyces pombe comparative genomics; from sequence to systems", because that is where the process is described- unfortunately, it does not have a PMID, only a DOI? We don't actually talk about manual ortholog assignment in any PomBase papers.
Actually let's use this: PMID: 29761456 it has a section on manual ortholog curation and refers back to the book chapter that has more detail.
Actually let's use this: PMID: 29761456
Should we use that reference for the human orthologs loaded via PomBase too?
Yes please.
Yes please.
I guessed you'd say that that's what I did: https://www.japonicusdb.org/reference/PMID:29761456
A list of all the "odd cases" like usp105 would be useful.
So cases where mapping via pombe gives a different ortholog than Compara gives us? Are there any other odd cases to check for at the same time?
A list of all the "odd cases" like usp105 would be useful.
Here's a first attempt. It's a table with japonicus IDs, then orthologs from Compara and orthologs via PomBase. It's a quick first attempt so I haven't double checked the results.
This looks promising. This is only differences ? i.e if pombase had curated an ortholog but Comapra missed it would that get reported as a difference?
if pombase had curated an ortholog but Comapra missed it would that get reported as a difference?
Sorry I don't follow that.
The file only contains cases where a japonicus gene has one or more orthologs from Compara and also one or more different orthologs via PomBase.
OK I am working through this list. https://github.com/japonicusdb/japonicus-curation/issues/40
I was wondering if it included cases where pombase had an ortholog and japonicus didn't Looking at the file properly told me those aren't included. I don't even need this for checking, but we could report the extra coverage the pombase orthologs gives us so it is quite a useful number if it is easy to get.
(i.e. which genes have (any) PomBase ortholog but no Compara ortholog)
which genes have (any) PomBase ortholog but no Compara ortholog
Sorry I'm still unclear. When you say "PomBase ortholog" do you mean the japonicus gene has a pombe ortholog or that the japonicus gene has a cerevisiae gene inferred via PomBase?
If that what you are asking for the japonicus genes that have a cerevisiae ortholog via PomBase, but no cerevisiae ortholog from Compara, there are 1438. There are 67 japonicus genes that have a Compara cerevisiae ortholog but no ortholog via PomBase.
Yes this is exactly what I wanted:
There are 67 japonicus genes that have a Compara cerevisiae ortholog but no ortholog via PomBase.
I just want to check I am not missing anything. These are probably also positives though (just best hits to non orthologous family members), but I want to check them sometime to be sure, I haven't missed anything....
no hurry though, as I won't be looking at it for a while..
Here's the list of genes while I have it available: compara_cerevisiae_ortholog_with_no_orth_via_pombase.txt
The. checking tasks spawned from this ticket are now on the curation tracker
I decided to go ahead and add the cerevisiae orthologs from Compara. It's easy to edit or remove them later.
Here is the file: https://github.com/japonicusdb/japonicus-curation/blob/main/cerevisiae_orthologs.tsv
The cerevisiae orthologs should be actitve everywhere. Here's an example: http://japonicusdb.kmr.nz/vis/from/id/283fe14f-8122-498e-9ff1-85b4381c28f3
Originally posted by @kimrutherford in https://github.com/japonicusdb/japonicus-config/issues/19#issuecomment-846391369