gwu-libraries / obento

Bento Box style search results page
MIT License
8 stars 1 forks source link

T389 360link open url #397

Closed adityadharne closed 7 years ago

adityadharne commented 8 years ago

Adds logic to obtain E-resource link from OpenUrl response XML.

kerchner commented 8 years ago
kerchner commented 8 years ago
# returns an array of OrderedDict objects (if present):
urlnodes = xmldict['ssopenurl:openURLResponse']['ssopenurl:results']['ssopenurl:result']['ssopenurl:linkGroups']['ssopenurl:linkGroup']['ssopenurl:url']
# returns a list:
articleurls = [node['#text'] for node in urlnodes if '@type' in node and node['@type'] == 'article']

then you can, for instance, test whether not articleurls (to see if the list was empty, i.e. no article nodes were found). Otherwise, you should be able to proceed and, for instance, bind articleurl = articleurls[0]

kerchner commented 8 years ago

@adityadharne I've discovered that there is a case we haven't accounted for, which is when the XML result contains multiple <ssopenurl:linkGroup> results inside the <ssopenurl:linkGroups> tag.

When there are multiple <ssopenurl:linkGroup> nodes, xmltodict.parse() returns the ['sslopenurl:linkGroup'] portion of the object as an array of OrderedDict objects: [OrderedDict([(u'ssopenurl:url',... ), OrderedDict(...)]. Example here.

When there is only a single <ssopenurl:linkGroup> node, xmltodict.parse() returns ['sslopenurl:linkGroup'] portion of the object as a single OrderedDict object: OrderedDict([(u'ssopenurl:url',... ). Example here.

In the former case, when there are multiple holdings (and each may contain an "article" result), how do we determine which to use? You may want to consult with @lwrubel and/or @cummingsm about this.

The query I used in this case was database and the articles above correspond to:

(single linkGroup:) Berrington, James Anaesthesia and Intensive Care Medicine, 2014

and

(multiple linkGroups - this is the case we need to account for:) The IMGT/HLA database Robinson, James, Lopez, Rodrigo, Marsh, St... Nucleic Acids Research, 2013

Please modify the logic to account for both scenarios.

Also, there are still some PEP8 issues to address.

kerchner commented 8 years ago

Update: It appears from the code for xmltodict.parse() that passing a force_list parameter to xmltodict.parse() will force the parser to always create lists for any child node specified in the list. So you would pass force_list=('ssopenurl:linkGroup') to always create a list even if there is only one <ssopenurl:linkGroup> child.

This at least makes the resulting structure consistent, but you will still have the problem of:

kerchner commented 7 years ago

Test query: "Composable scheduler activations for Haskell" master: findit page t389 branch: direct link to article

kerchner commented 7 years ago

Another test query: "Palm Carotene Resin" Click on "Preservation of carotenes in the deacidification of crude palm oil" master: findit page t389 branch: direct link to article

kerchner commented 7 years ago

@lwrubel Please review (at a minimum for logic/readability)

kerchner commented 7 years ago

@lwrubel I've rewritten the OpenURL logic to accommodate a fuller range of possibilities in the XML, one of which would be like the case above in your comment referencing a search of "manchester by the sea". By accommodating single vs. multiple linkGroup nodes, result nodes, etc. we should now be getting any article url nodes when present. A test of 21 searches resulted in the following:

So we've essentially eliminated (approximately) 3 out of 4 extra clickthroughs. Overall, we've improved from 50% direct links to the article (with just Summon), to around 88% direct links (using Summon if present, otherwise trying OpenURL if possible).

Please do not merge yet, I'll first take out some of the logging that helped gather those statistics.

lwrubel commented 7 years ago

Read through the code, and it works for me on my instance.