gwu-libraries / launchpad

A django based system that provides a stable URL for every item in the library's catalog. Various discovery services will link to these URLs. The page for each item will in turn link out to various other resources that provide methods for accessing the content of the items.
MIT License
15 stars 9 forks source link

truncation when db name includes '&' (re m14_001 #340) #373

Closed cummingsm closed 11 years ago

cummingsm commented 11 years ago

From issue #340 Converting ampersand to the word 'and' worked ok, but now we are getting the parameter name. In this case, 'ti'

http://gwfindit-test.wrlc.org/item/2265138?genre=article&issn=0010194X&title=Columbia%20Journalism%20Review&volume=52&issue=1&date=20130501&atitle=Streams%20of%20consciousness.&spage=24&pages=24-36&sid=EBSCO:Communication%20&%20Mass%20Media%20Complete&aulast=ADLER,%20BEN

The sid field is getting transformed to EBSCO:Communication%20andti:GWLP

dchud commented 11 years ago

we think this was fixed in #340 - write a test case that verifies the fix.

lwrubel commented 11 years ago

I think I see where the problem ampersand is getting introduced.

1) EBSCO creates this link to 360 Link:

http://findit.library.gwu.edu/go?genre=article&issn=01636804&title=IEEE+Communications+Magazine&volume=51&issue=3&date=20130301&atitle=A+systematic+and+flexible+approach+for+testing+future+mobile+networks+by+exploiting+a+wrap-around+testing+methodology.&spage=160&pages=160-167&sid=EBSCO:Communication+%26+Mass+Media+Complete&aulast=Pinola%2c+Jarno

2) 360 Link has a link to launchpad; it's called "Available in Print". The link does not go straight to launchpad, but goes to Serials Solutions, logging the link, with a parameter which is the link to launchpad that I loaded into its knowledgebase:

http://findit.library.gwu.edu/log?L=UZ4UG4LZ9G&D=TN5&J=IEEECOMMAG&P=Link&U=http%3A%2F%2Ffindit.library.gwu.edu%2Fitem%2F2503524

3) To pass the citation information on to launchpad, the 360 Link Reset javascript appends location.search to that URL, so the final URL in 360 Link looks like this:

http://findit.library.gwu.edu/log?L=UZ4UG4LZ9G&D=TN5&J=IEEECOMMAG&P=Link&U=http%3A%2F%2Ffindit.library.gwu.edu%2Fitem%2F2503524?genre=article&issn=01636804&title=IEEE+Communications+Magazine&volume=51&issue=3&date=20130301&atitle=A+systematic+and+flexible+approach+for+testing+future+mobile+networks+by+exploiting+a+wrap-around+testing+methodology.&spage=160&pages=160-167&sid=EBSCO:Communication+%26+Mass+Media+Complete&aulast=Pinola%2c+Jarno

4) I think Serials Solutions must unencode the URL when it goes through their system. That's necessary for main part of the URL, but causes problems with encoded characters in the location.search part:

http://findit.library.gwu.edu/item/2503524?genre=article&issn=01636804&title=IEEE%20Communications%20Magazine&volume=51&issue=3&date=20130301&atitle=A%20systematic%20and%20flexible%20approach%20for%20testing%20future%20mobile%20networks%20by%20exploiting%20a%20wrap-around%20testing%20methodology.&spage=160&pages=160-167&sid=EBSCO:Communication%20&%20Mass%20Media%20Complete&aulast=Pinola,%20Jarno

I'm not sure where the spaces got encoded, though.

Wondering if I should use encodeURIComponent on location.search in step 3 or if that would cause more problems? The 360 Link Reset code is under my control (and where I started this whole problem), so that's a place to work. I can't change what Serials Solutions does in their redirect.

edsu commented 11 years ago

This is great @lwrubel. Can you tell me how to get to Ebsco Communications & Mass Media so I can walk through it as well?

lwrubel commented 11 years ago

You'll need to be either on campus or use VPN:

http://search.ebscohost.com/login.aspx?authtype=ip,uid&profile=ehost&defaultdb=ufh

In the search results, the blue "Find It" button is the link to 360 Link. On the 360 Link screen, "Available in Print" link may not be visible in that screen unless you click "Show 1 more result".

edsu commented 11 years ago

Ok I can see the Available in Print link on this page has the sid encoded fine:

http://findit.library.gwu.edu/log?L=UZ4UG4LZ9G&D=TN5&J=IEEECOMMAG&P=Link&U=http%3A%2F%2Ffindit.library.gwu.edu%2Fitem%2F2503524?genre=article&issn=01636804&title=IEEE+Communications+Magazine&volume=42&issue=7&date=20040701&atitle=XML-Based+Configuration+Management+for+IP+Network+Devices.&spage=84&pages=84-91&sid=EBSCO:Communication+%26+Mass+Media+Complete&aulast=Choi%2c+Mi-Jung

But when you click on that you end up getting redirected to a URL that has the sid encoded incorrectly. I used curl on the command line so you can see the request and the redirect:

% curl -i 'http://findit.library.gwu.edu/log?L=UZ4UG4LZ9G&D=TN5&J=IEEECOMMAG&P=Link&U=http%3A%2F%2Ffindit.library.gwu.edu%2Fitem%2F2503524?genre=article&issn=01636804&title=IEEE+Communications+Magazine&volume=42&issue=7&date=20040701&atitle=XML-Based+Configuration+Management+for+IP+Network+Devices.&spage=84&pages=84-91&sid=EBSCO:Communication+%26+Mass+Media+Complete&aulast=Choi%2c+Mi-Jung'
HTTP/1.1 302 Moved Temporarily
Date: Thu, 17 Oct 2013 01:27:41 GMT
Server: Apache-Coyote/1.1
Location: http://findit.library.gwu.edu/item/2503524?genre=article&issn=01636804&title=IEEE%20Communications%20Magazine&volume=42&issue=7&date=20040701&atitle=XML-Based%20Configuration%20Management%20for%20IP%20Network%20Devices.&spage=84&pages=84-91&sid=EBSCO:Communication%20&%20Mass%20Media%20Complete&aulast=Choi,%20Mi-Jung
Content-Length: 0
Via: 1.1 findit.library.gwu.edu
Cache-Control: max-age=300
Expires: Thu, 17 Oct 2013 01:32:40 GMT
Content-Type: text/plain

See the sid=EBSCO:Communication%20&%20Mass%20Media%20Complete and how the & isn't %26 anymore? I experimented and can see & in other fields has the same problem.

So my question is, what is the application running at http://findit.library.gwu.edu/log and can we fix it? If we can't fix it can we contact the vendor and let them know about the bug?

lwrubel commented 11 years ago

http://findit.library.gwu.edu/log is a proxy for http://uz4ug4lz9g.search.serialssolutions.com/log?

We aren't able to fix this ourselves, so I will let them know about the bug.

lwrubel commented 11 years ago

Reported to Serials Solutions, Incident: 131022-000170