monarch-initiative / dipper

Data Ingestion Pipeline for Monarch
https://dipper.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
57 stars 26 forks source link

add pub fetcher as post-processing utility #239

Open nlwashington opened 8 years ago

nlwashington commented 8 years ago

in order to display the publications that any source contains with nice labels, it would be prudent to fetch the publication details from pubmed, if they are available. this could be done as a post-processing step like the following:

  1. after the entire graph is built for a source, run a query to get all nodes that are publications
  2. in batch, query using eutils for the basic publication information for any PMIDs. add the authors, title, short citation, year, and other publication metadata. consider adding the abstract, if we want to show it when hovering.
  3. either insert the publication metadata back into the graph, or make a separate dump of the publication metadata into a new graph file.

also, this could be set to complete based on a commandline flag, because the time spent querying eutils might be quite high, depending on the source.

@cmungall do you like the idea of doing this at dipper time, or at golr time?

cmungall commented 8 years ago

It would be good to avoid re-querying each time. I think we'd want to maintain our own permanent cache. This could be SG, or another solution. If it's SG, would it be easy to merge the SG publication graph into each fresh monarch data graph?