T389 360link open url - Githubissues

adityadharne commented 8 years ago

Adds logic to obtain E-resource link from OpenUrl response XML.

kerchner commented 8 years ago

Is urlfetch necessary?
xmltodict needs to be added to requirements.py (and bound the major version, as well)
Some initial testing yields a result where the button has no link. Search on "database configuration" - the first result ("The memory of MICE: The configuration database") provides a button with no link: <button onclick="window.open('')" title="" )="" target="_blank">Online <i class="fa fa-book"></i></button>
Consider adding some comments to explain the logic in the code. It might help to include an example of the (relevant portions of the XML) that the code is expecting.

kerchner commented 8 years ago

The new section(s) of code have a number of PEP8 issues that should be cleaned up. PEP8 checking will also reveal unused imports, the fact that the xmldoc variable is not used, and more. (stylistically, also please put a space after the #
The code can be simplified by using a python list comprehension. It would employ something like this:

# returns an array of OrderedDict objects (if present):
urlnodes = xmldict['ssopenurl:openURLResponse']['ssopenurl:results']['ssopenurl:result']['ssopenurl:linkGroups']['ssopenurl:linkGroup']['ssopenurl:url']
# returns a list:
articleurls = [node['#text'] for node in urlnodes if '@type' in node and node['@type'] == 'article']

then you can, for instance, test whether not articleurls (to see if the list was empty, i.e. no article nodes were found). Otherwise, you should be able to proceed and, for instance, bind articleurl = articleurls[0]

I'm not sure I understand why any change is necessary in summon.html. In views.py after you compute whether or not there's an article URL from OpenURL, can't you just re-bind match['url'] to the article URL if you were able to get one from OpenURL?

kerchner commented 8 years ago

@adityadharne I've discovered that there is a case we haven't accounted for, which is when the XML result contains multiple <ssopenurl:linkGroup> results inside the <ssopenurl:linkGroups> tag.

When there are multiple <ssopenurl:linkGroup> nodes, xmltodict.parse() returns the ['sslopenurl:linkGroup'] portion of the object as an array of OrderedDict objects: [OrderedDict([(u'ssopenurl:url',... ), OrderedDict(...)]. Example here.

When there is only a single <ssopenurl:linkGroup> node, xmltodict.parse() returns ['sslopenurl:linkGroup'] portion of the object as a single OrderedDict object: OrderedDict([(u'ssopenurl:url',... ). Example here.

In the former case, when there are multiple holdings (and each may contain an "article" result), how do we determine which to use? You may want to consult with @lwrubel and/or @cummingsm about this.

The query I used in this case was database and the articles above correspond to:

(single linkGroup:) Berrington, James Anaesthesia and Intensive Care Medicine, 2014

and

(multiple linkGroups - this is the case we need to account for:) The IMGT/HLA database Robinson, James, Lopez, Rodrigo, Marsh, St... Nucleic Acids Research, 2013

Please modify the logic to account for both scenarios.

Also, there are still some PEP8 issues to address.

kerchner commented 8 years ago

Update: It appears from the code for xmltodict.parse() that passing a force_list parameter to xmltodict.parse() will force the parser to always create lists for any child node specified in the list. So you would pass force_list=('ssopenurl:linkGroup') to always create a list even if there is only one <ssopenurl:linkGroup> child.

This at least makes the resulting structure consistent, but you will still have the problem of:

identifying when you have multiple results that contain an <ssopenurl:url type="article">, and
determining which result to choose.

kerchner commented 7 years ago

Test query: "Composable scheduler activations for Haskell" master: findit page t389 branch: direct link to article

kerchner commented 7 years ago

Another test query: "Palm Carotene Resin" Click on "Preservation of carotenes in the deacidification of crude palm oil" master: findit page t389 branch: direct link to article

kerchner commented 7 years ago

@lwrubel Please review (at a minimum for logic/readability)

kerchner commented 7 years ago

@lwrubel I've rewritten the OpenURL logic to accommodate a fuller range of possibilities in the XML, one of which would be like the case above in your comment referencing a search of "manchester by the sea". By accommodating single vs. multiple linkGroup nodes, result nodes, etc. we should now be getting any article url nodes when present. A test of 21 searches resulted in the following:

without any of this logic, 50% would have resolved directly using the URL from summon, but 50% would have taken the user to the findit/360link page.
with the logic, 75% (vs. 0%) of the findit/360link results are instead resolving directly thanks to pulling from the OpenURL!

So we've essentially eliminated (approximately) 3 out of 4 extra clickthroughs. Overall, we've improved from 50% direct links to the article (with just Summon), to around 88% direct links (using Summon if present, otherwise trying OpenURL if possible).

Please do not merge yet, I'll first take out some of the logging that helped gather those statistics.

lwrubel commented 7 years ago

Read through the code, and it works for me on my instance.

gwu-libraries / obento

T389 360link open url #397