Closed ValWood closed 3 years ago
Does it make sense to have a process that automatically pulls in pombe names and products via 1-1 orthologs, then have a TSV file of "overrides" to fix names and products that aren't correctly inferred. It could also contain the name and product that can't be inferred, like the histones (japonicusdb/japonicus-curation#3).
eg.
systematic_id name product
SJAG_04351 guf1 GTPase
...
I've done a first implementation of this. So japonicus now has a cdc2: http://japonicusdb.kmr.nz/gene/SJAG_03048
This is the bit that's not done:
then have a TSV file of "overrides" to fix names and products that aren't correctly inferred.
For now if you need to override the inferred name or product so need to edit the contig files with /primary_name and /product on the CDS features.
I've removed all the /primary_name and /product qualifiers from the contig files.
OK, I'm sure we can hold of editing names until the file is available ;)
Looks fab. Can you let me know how many gene names were assigned? We can report that in the paper since we will be the 'official' naming body.
Can you let me know how many gene names were assigned?
Currently 3550 names and 4071 products but that will change if you edit the orthologs file.
How many genes had names already, you said ~100, do you have the exact number for the paper?
There were 101 gene names from UniProt.
This is implemented now.
I've started a file of names and products here: https://github.com/japonicusdb/japonicus-curation/blob/main/names_and_products.tsv
It's just these columns, tab separated:
I'm doing a load to test it.
will probably also require a 'synonyms' column. and a defined separator for >1
will probably also require a 'synonyms' column.
OK, I'll work on that.
I'm doing a load to test it.
That worked so the 6 genes from the TSV file are updated in JaponicusDB: https://github.com/japonicusdb/japonicus-curation/blob/main/names_and_products.tsv http://japonicusdb.kmr.nz/gene/SJAG_06382
This names and products from the TSV file take priority over the names and products transferred by ortholog from pombe. So we can now correct problems from the transfer.
The mitochondrial genome contig still has the gene names from the ENA entry. Should we:
The mitochondrial genome contig still has the gene names from the ENA entry. Should we:
transfers from pombe should work.....
transfers from pombe should work.....
Now I've looked more carefully, that's what's happening. The mitochondrial contig file had "/gene=" qualifiers but we ignore them. Our code only looks for "/primary_name=".
The mito genes with names are getting the names via pombe orthologs.
As a reminder to myself, I think everything is done here except for:
will probably also require a 'synonyms' column.
As a reminder to myself, I think everything is done here except for:
will probably also require a 'synonyms' column.
That's done now.
I've added an empty "synonyms" column to japonicus-curation/names_and_products.tsv
and I've changed the code to load the synonyms column.
Val, do you have any synonyms you can add to test that everything works?
There is one existing synonym ("SJAG_05310" for gene SJAG_05309) from the contig files.
and a defined separator for >1
I forgot to say that you can separate synonyms with a comma: "abc1,xyz2" It's easy to change to a different separator if needed.
You'll definitely need to do a git pull
before your next edit of japonicus-curation/names_and_products.tsv
because I've changed every line to add the synonyms columns.
The column order is now:
Right I forgot synonyms was implemented.
the header line still says
is there a column 4 for synonyms?
OK, I now see in my mailbox that you only JUST did it. That will be why I didn't test it yet.... I can easily find some examples to test...
In fact, we should import all of the pombe synonyms for the 1:1 orthologs....
the header line still says
systematic_id primary_name product
It should change if you "git pull".
In fact, we should import all of the pombe synonyms for the 1:1 orthologs....
I'll make a separate issue for that. Does it make sense for japonicus to have synonyms like "SPCC18B5.12" (from cab5)?
We also need to think how to keep aligned with Pombase as new names are added and gene products are tweaked. In the early stages I can just list changes to be included on the japonicus tracker.
moved to japonicusdb/japonicus-curation#3: ~Other naming tasks, which should be a ticket on the curation tracker once it exists~