support in API making the cache an #altmetrics provider

egonw commented 10 years ago

Together with eNanoMapper partners (https://github.com/enanomapper) and @andrawaag , started writing something up:

http://specs.enanomapper.net/altmetrics/

I think it would be great of Open PHACTS would support the following API calls:

number of data sets for DOI
number of data points for DOI (like number of measurements, compounds, ...)

The first would be more like formal citations (cito:citesAsDataSource), the second more like "page views".

ChEMBL is, of course, a big resource, but has a mix of PubMed and DOI. Here too, we could use BridgeDb and make a linkset... there are some PubMed<->DOI services...

Christian-B commented 10 years ago

I agree that a PubMed<->DOI linkset is a great idea and the correct use of the IMS/BridgeDd. These both actually point to the same paper/journal ect.

Like all IMS/BridgeDd mapping the fact that their may be many alternative prefixes for PubMed URIs and DOI Uris is not a problem for the OPS branch of IMS/BridgeDd at all.

However data about what is covered by a paper like number of measurements, chemicals described ect, is not mapping data and should not be included in IMS/BridgeDd.

antonisloizou commented 10 years ago

The SPARQL queries themselves are relatively easy to write, once we specify which API calls we want, and what they should return.

One way to go would be generic "Entities for Document: List", "Entities for Document: Count" and inverse "Documents for Entity" methods, where:

Document is either a Publication or a Patent
Entity is one of : compound, target, pathway, disease, tissue, activity
There is a filter to specify which type of entity to return

This gives 4 generic methods to maintain, similar to the Hierarchy APIs.

At the other extreme we could have 4 (List + Count, both ways) methods per entity type pair, e.g.

"Compounds for Patent: List", "Compounds for Patent: Count", "Patents for Compound: List" and "Patents for Compound: Count" and also
"Compounds for Publication: List", "Compounds for Publication: Count", "Publications for Compound: List" and "Publications for Compound: Count"

Here we end up with over 40(!!!) individual new methods. Obviously my preference is for the generic methods, however specific ones will end up executing faster by definition.

We can also have a mix of the generic ones + a subset of the specific methods we expect to be used more frequently, to allow those to be quicker.

I'll look into putting some first versions of the generic queries on the dev API , so we can get a feel for performance - of course when patents come in we'll have to re-evaluate.

antonisloizou commented 10 years ago

...sorry, clicked "Closed and comment" rather than just comment ...

AlasdairGray commented 10 years ago

I agree with @Christian-B division as to what should be in the Cache and what should be in the IMS here. This is as we discussed at the SureCHEMBL meeting.

egonw commented 10 years ago

It was not my intent to mix this with the discussions around the patent-$foo links. While that is important, and more important than this request, I here really just wanted to request two #altmetric calls, with it's own #altmetrics use case: the two listed in the request.

BTW, the query for the WPRDF seems to be something like:

prefix wp:  <http://vocabularies.wikipathways.org/wp#>
prefix dcterms: <http://purl.org/dc/terms/>

SELECT distinct ?object WHERE {
  <http://identifiers.org/pubmed/11252892> a wp:PublicationReference ;
    dcterms:isPartOf ?object .
} ORDER BY ?object

Note that it needs a PubMed...

egonw commented 10 years ago

Some follow up discussions reminded me that we used this before for ChEMBL already, with Andra's CitedIn. See the relevant section in our ChEMBL-RDF paper: http://www.jcheminf.com/content/5/1/23

openphacts / OPS_LinkedDataApi

support in API making the cache an #altmetrics provider #12