hasadna / OpenPress

Open Press Archive Repo
MIT License
2 stars 8 forks source link

Incorrect URL in JSON "image" field #51

Open niryariv opened 10 years ago

niryariv commented 10 years ago

To reproduce:

  1. http://opa.org.il/api/v1/?query=%D7%9E%D7%99%D7%A9%D7%A7%D7%99%20%D7%99%D7%A8%D7%99%D7%91

2, each results contain an image url, eg:

      "image": "http://www.jpress.nli.org.il/Olive/APA/NLI_heb/get/GetImage.ashx?kind=block&href=DAV/1980/7/1&id=Ar03604&ext=.png"
  1. Opening the URL returns an error page, eg http://www.jpress.nli.org.il/Olive/APA/NLI_heb/get/GetImage.ashx?kind=block&href=DAV/1980/7/1&id=Ar03604&ext=.png returns: screen shot 2014-10-01 at 4 49 05 pm
asafvala commented 9 years ago

verified the bug.

This happens because of malformed parsing of the XML files:

Some articles are divided into several sections, meaning the article image is divided into several sections. For example "http://opa.org.il/api/v1/?query=%D7%9E%D7%99%D7%A9%D7%A7%D7%99%20%D7%99%D7%A8%D7%99%D7%91" - The link id is: http://www.jpress.nli.org.il/Olive/APA/NLI_heb/get/GetImage.ashx?kind=block&href=DAV/1980/7/1&id=Ar03604&ext=.png But "id=Ar03604" should have been "id=Ar0360401".

_We need to note that there are several possible frames inside each article - should consider this _