thechiselgroup / biomixer

BioMixer
http://bio-mixer.appspot.com/
16 stars 13 forks source link

Definition and Properties from Same Call #470

Closed everbeek closed 9 years ago

everbeek commented 9 years ago

If I do http://data.bioontology.org/ontologies/UBERON/classes/http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FUBERON_0007642?include=properties,definition instead of the non-include version, I can get prefLabel, definition, and composition properties all at once. I do not need to change callbacks, I only need to change the url used by the composition and base callbacks; the cache will handle the rest (but let's confirm that).

The prefLabel access becomes access of the property "http://data.bioontology.org/metadata/def/prefLabel".

everbeek commented 9 years ago

I am not sure that when synonyms are present, that they will have corresponding synonym properties. In one example from UBERON, there was a synonym property called "http://www.geneontology.org/formats/oboInOwl#hasExactSynonym". This is good, but the call without properties had no synonyms listed.

Actually, that call has a second synonym property, called "http://www.geneontology.org/formats/oboInOwl#hasRelatedSynonym"".

So really, I have discovered that just like composition relations, there are potential custom synonym property names, that I would not in principle be able to discover in advance. I could search the properties for the string "synonym" and treat them as such if I find them. Is this critical? It is not structurally present in the graph, so I think not. Users can go through to the concept page if they are want more content about the concept. So, I will not try to parse all these.

Since it is desirable to have some synonyms, and Bioportal has the synonym part defined in the non-incude version...wait...if I add ",synonym", I also get that info. Checking latency for those calls...I save 170ms if I make an argument-free call, but I am already making include=properties calls. The difference between that one and with definition,synonyms tagged on is only 40 to 70ms (~700 for properties, 729 for prop, def, and 755 for prop, def, syn). There is variability, but this looks like a good trade, because I go from 700+170 to less than 800, with the benefit of fewer REST calls.

Updating URLs generated to use the larger includes one, and making sure the callbacks have any updates they need.

everbeek commented 9 years ago

Oh, I added prefLabel to the call too, and that added it back in, so I don't need to rely on any property claiming to be a preferred label (property named "http://data.bioontology.org/metadata/def/prefLabel", but could conceivably be different for different concepts or ontologies).

everbeek commented 9 years ago

Note, when testing latency for changes like this, it is not ok to compare to google app engine deployments, since they do code compression, and you will not be able to compare total quantity of data transferred with development server code.

everbeek commented 9 years ago

The number of extra calls for the old URLs is small in my experiments (200 vs 205). This suggests that perhaps my browser caching is not working properly. Searching for duplicate outgoing calls...no, it just turns out that only 5 calls extra were made. Aha! The paths to root data has definition, synonym, etc in the returned data. This makes sense. Ok, so for path to root, I expect the benefits of the new URLs to be lower. Testing with term neighbourhood instead, for greater sensitivity.

192 requests in ~700ms old vs 171 requests in also ~700ms.

So there is no single-instance benefit, but if a larger visualization or multiple instances are calling on server resources at the same time, latency will be improved if a smaller number of calls are made. This is assuming that the calls do not differ dramatically in their on-server CPU and memory resource requirements.