luispedro / waldo

Waldo Project
MIT License
2 stars 3 forks source link

Load MGI Pubmed references #3

Closed magsol closed 14 years ago

magsol commented 14 years ago

No MGI references are currently found for any entries, possibly due to the use of BIB_Pubmed instead of MRK_Reference (ftp://ftp.informatics.jax.org/pub/reports/index.html#refs).

magsol commented 14 years ago

Confirmed: MGI IDs within MRK_Ensembl.rpt are referenced in MRK_Reference.rpt, not BIB_Pubmed.rpt as previously believed. Will need to alter statistics code to reflect this.

Evidence codes: http://www.candidagenome.org/cgi-bin/GO/goEvidence.pl

luispedro commented 14 years ago

I have no clue what this is talking about so I cannot judge whether it still applies.

magsol commented 14 years ago

The "evidence" for each MGI entry in MRK_Ensembl.rpt is listed in MRK_Reference.rpt. So if you want to cross-reference each MGI entry with its PubMed evidence you will need to use MRK_Reference.rpt.

magsol commented 14 years ago

Here it is:

ftp://ftp.informatics.jax.org/pub/reports/index.html#refs

Using the MRK_Reference.rpt, the first column is the MGI:ID that needs to be matched with whatever entry we're inserting into SQLite, and then the 5th column from there is the "|"-separated PubMed IDs that provide the "evidence" for the MGI entry.

I don't know of a more efficient way of doing this than searching the MRK_Reference file every single time we insert an MGI entry...or insert all the PubMed IDs into SQLite first, then match them to MGI entries as we read them in.

We'll need to rebuild the database once this is done, since currently the "pubmedid" field of the MGI Entry model is storing nothing more than a database name (e.g. "Uniprot", "MGI", etc...obviously not a PubMed ID).