japonicusdb / japonicus-config

Configuration for JaponicusDB
0 stars 1 forks source link

S. cerevisiae/S. japonicus orthologs #60

Closed ValWood closed 3 years ago

ValWood commented 3 years ago

I decided to go ahead and add the cerevisiae orthologs from Compara. It's easy to edit or remove them later.

Here is the file: https://github.com/japonicusdb/japonicus-curation/blob/main/cerevisiae_orthologs.tsv

The cerevisiae orthologs should be actitve everywhere. Here's an example: http://japonicusdb.kmr.nz/vis/from/id/283fe14f-8122-498e-9ff1-85b4381c28f3

Originally posted by @kimrutherford in https://github.com/japonicusdb/japonicus-config/issues/19#issuecomment-846391369

ValWood commented 3 years ago

It will be better to do use PomBase S. cerevisiae orthologs as these will give much more coverage.

For example compare misses this one https://github.com/japonicusdb/japonicus-config/issues/19 which makes it look like it has a human ortholog but is lost from S. cerevsiae.

We should use PomaBase by default (citing PomBase paper) and Compara which will pull in the orthologs for those things that are lost from pombe but present in japonicus and S. cerevisiae.

kimrutherford commented 3 years ago

citing PomBase paper

The 2018 paper?

kimrutherford commented 3 years ago

citing PomBase paper

Should we use the same citation for the human orthologs in JaponicusDB?

kimrutherford commented 3 years ago

Done! I think.

There are now a bunch of orthologs for cerevisiae that come via PomBase: https://www.japonicusdb.org/reference/PMID:30321395

There are some oddities. For example usp105(SJAG_00058) has:

The log file for the Compara cerevisiae loading warns about every ortholog that was already added to Chado via PomBase orthologs: https://curation.pombase.org/japonicus_nightly/latest_build/logs/log.2021-08-15-04-25-03.compara_cerevisiae_orthologs maybe that's not useful?

ValWood commented 3 years ago

A list of all the "odd cases" like usp105 would be useful. I could do a check to make sure we have them correctly. In this case there are 2 paralogs in S. cerevisiae and I missed PRP42 as it is more divergent.

ValWood commented 3 years ago

The best publication to cite for orthologs really is "Schizosaccharomyces pombe comparative genomics; from sequence to systems", because that is where the process is described- unfortunately, it does not have a PMID, only a DOI? We don't actually talk about manual ortholog assignment in any PomBase papers.

ValWood commented 3 years ago

Actually let's use this: PMID: 29761456 it has a section on manual ortholog curation and refers back to the book chapter that has more detail.

kimrutherford commented 3 years ago

Actually let's use this: PMID: 29761456

Should we use that reference for the human orthologs loaded via PomBase too?

ValWood commented 3 years ago

Yes please.

kimrutherford commented 3 years ago

Yes please.

I guessed you'd say that that's what I did: https://www.japonicusdb.org/reference/PMID:29761456

A list of all the "odd cases" like usp105 would be useful.

So cases where mapping via pombe gives a different ortholog than Compara gives us? Are there any other odd cases to check for at the same time?

kimrutherford commented 3 years ago

A list of all the "odd cases" like usp105 would be useful.

Here's a first attempt. It's a table with japonicus IDs, then orthologs from Compara and orthologs via PomBase. It's a quick first attempt so I haven't double checked the results.

japonicus_genes_with_two_orth_sources.tsv.txt

ValWood commented 3 years ago

This looks promising. This is only differences ? i.e if pombase had curated an ortholog but Comapra missed it would that get reported as a difference?

kimrutherford commented 3 years ago

if pombase had curated an ortholog but Comapra missed it would that get reported as a difference?

Sorry I don't follow that.

The file only contains cases where a japonicus gene has one or more orthologs from Compara and also one or more different orthologs via PomBase.

ValWood commented 3 years ago

OK I am working through this list. https://github.com/japonicusdb/japonicus-curation/issues/40

I was wondering if it included cases where pombase had an ortholog and japonicus didn't Looking at the file properly told me those aren't included. I don't even need this for checking, but we could report the extra coverage the pombase orthologs gives us so it is quite a useful number if it is easy to get.

(i.e. which genes have (any) PomBase ortholog but no Compara ortholog)

kimrutherford commented 3 years ago

which genes have (any) PomBase ortholog but no Compara ortholog

Sorry I'm still unclear. When you say "PomBase ortholog" do you mean the japonicus gene has a pombe ortholog or that the japonicus gene has a cerevisiae gene inferred via PomBase?

If that what you are asking for the japonicus genes that have a cerevisiae ortholog via PomBase, but no cerevisiae ortholog from Compara, there are 1438. There are 67 japonicus genes that have a Compara cerevisiae ortholog but no ortholog via PomBase.

ValWood commented 3 years ago

Yes this is exactly what I wanted:

There are 67 japonicus genes that have a Compara cerevisiae ortholog but no ortholog via PomBase.

I just want to check I am not missing anything. These are probably also positives though (just best hits to non orthologous family members), but I want to check them sometime to be sure, I haven't missed anything....

ValWood commented 3 years ago

no hurry though, as I won't be looking at it for a while..

kimrutherford commented 3 years ago

Here's the list of genes while I have it available: compara_cerevisiae_ortholog_with_no_orth_via_pombase.txt

ValWood commented 3 years ago

The. checking tasks spawned from this ticket are now on the curation tracker