japonicusdb / japonicus-config

Configuration for JaponicusDB
0 stars 1 forks source link

maintain japonicus names and products outside of contig files #12

Closed ValWood closed 3 years ago

ValWood commented 3 years ago

We also need to think how to keep aligned with Pombase as new names are added and gene products are tweaked. In the early stages I can just list changes to be included on the japonicus tracker.

moved to japonicusdb/japonicus-curation#3: ~Other naming tasks, which should be a ticket on the curation tracker once it exists~

kimrutherford commented 3 years ago

Does it make sense to have a process that automatically pulls in pombe names and products via 1-1 orthologs, then have a TSV file of "overrides" to fix names and products that aren't correctly inferred. It could also contain the name and product that can't be inferred, like the histones (japonicusdb/japonicus-curation#3).

eg.

systematic_id         name         product
SJAG_04351            guf1         GTPase
...
kimrutherford commented 3 years ago

I've done a first implementation of this. So japonicus now has a cdc2: http://japonicusdb.kmr.nz/gene/SJAG_03048

This is the bit that's not done:

then have a TSV file of "overrides" to fix names and products that aren't correctly inferred.

For now if you need to override the inferred name or product so need to edit the contig files with /primary_name and /product on the CDS features.

I've removed all the /primary_name and /product qualifiers from the contig files.

ValWood commented 3 years ago

OK, I'm sure we can hold of editing names until the file is available ;)

ValWood commented 3 years ago

Looks fab. Can you let me know how many gene names were assigned? We can report that in the paper since we will be the 'official' naming body.

kimrutherford commented 3 years ago

Can you let me know how many gene names were assigned?

Currently 3550 names and 4071 products but that will change if you edit the orthologs file.

ValWood commented 3 years ago

How many genes had names already, you said ~100, do you have the exact number for the paper?

kimrutherford commented 3 years ago

There were 101 gene names from UniProt.

kimrutherford commented 3 years ago

This is implemented now.

I've started a file of names and products here: https://github.com/japonicusdb/japonicus-curation/blob/main/names_and_products.tsv

It's just these columns, tab separated:

I'm doing a load to test it.

ValWood commented 3 years ago

will probably also require a 'synonyms' column. and a defined separator for >1

kimrutherford commented 3 years ago

will probably also require a 'synonyms' column.

OK, I'll work on that.

I'm doing a load to test it.

That worked so the 6 genes from the TSV file are updated in JaponicusDB: https://github.com/japonicusdb/japonicus-curation/blob/main/names_and_products.tsv http://japonicusdb.kmr.nz/gene/SJAG_06382

This names and products from the TSV file take priority over the names and products transferred by ortholog from pombe. So we can now correct problems from the transfer.

The mitochondrial genome contig still has the gene names from the ENA entry. Should we:

ValWood commented 3 years ago

The mitochondrial genome contig still has the gene names from the ENA entry. Should we:

transfers from pombe should work.....

kimrutherford commented 3 years ago

transfers from pombe should work.....

Now I've looked more carefully, that's what's happening. The mitochondrial contig file had "/gene=" qualifiers but we ignore them. Our code only looks for "/primary_name=".

The mito genes with names are getting the names via pombe orthologs.

kimrutherford commented 3 years ago

As a reminder to myself, I think everything is done here except for:

will probably also require a 'synonyms' column.

kimrutherford commented 3 years ago

As a reminder to myself, I think everything is done here except for:

will probably also require a 'synonyms' column.

That's done now.

I've added an empty "synonyms" column to japonicus-curation/names_and_products.tsv and I've changed the code to load the synonyms column.

Val, do you have any synonyms you can add to test that everything works?

There is one existing synonym ("SJAG_05310" for gene SJAG_05309) from the contig files.

kimrutherford commented 3 years ago

and a defined separator for >1

I forgot to say that you can separate synonyms with a comma: "abc1,xyz2" It's easy to change to a different separator if needed.

You'll definitely need to do a git pull before your next edit of japonicus-curation/names_and_products.tsv because I've changed every line to add the synonyms columns.

The column order is now:

ValWood commented 3 years ago

Right I forgot synonyms was implemented.

the header line still says

systematic_id primary_name product

is there a column 4 for synonyms?

ValWood commented 3 years ago

OK, I now see in my mailbox that you only JUST did it. That will be why I didn't test it yet.... I can easily find some examples to test...

In fact, we should import all of the pombe synonyms for the 1:1 orthologs....

kimrutherford commented 3 years ago

the header line still says

systematic_id primary_name product

It should change if you "git pull".

In fact, we should import all of the pombe synonyms for the 1:1 orthologs....

I'll make a separate issue for that. Does it make sense for japonicus to have synonyms like "SPCC18B5.12" (from cab5)?