rdmpage / biorss

Harvest and repurpose RSS feeds
2 stars 0 forks source link

Add ZooBank as a source #4

Closed rdmpage closed 2 years ago

rdmpage commented 2 years ago

Add ZooBank as a source. There is an RSS feed at http://zoobank.org/rss/rss.xml but this doesn't seem to be regularly updated (see below). Perhaps ask @deepreef if this is the case. At the moment there doesn't seem to be an obvious way to get a time-ordered list of ZooBank records, and the API lacks some key information such as DOIs.

curl -I http://zoobank.org/rss/rss.xml
HTTP/1.1 200 OK
Last-Modified: Wed, 12 May 2021 23:00:37 GMT
Content-Type: text/xml
Accept-Ranges: bytes
Content-Length: 94651
Server: Jetty(9.3.5.v20151012)
deepreef commented 2 years ago

Many thanks, @rdmpage! As far as I know, you are the only person who has ever used the ZooBank RSS feed, so I only ever learn when it stops working when I hear from you! :-)

In any case, the good news is that it was a simple fix (apparently an errant lock on the rss.xml file was preventing the code from over-writing the file; I just needed to delete the rss.xml file and now it's working fine again).

Pro tip: If it ever seems to stop working, hit this page: http://zoobank.org/rssfeed.cfm This does several things: 1) Generates the up-to-the-moment RSS content for the preceding 25 hours 2) Displays that content in tabular form on the page 3) Generates an updated version of the rss.xml file for the RSS feed 4) Displays any error messages at the bottom of the page, offering clues about why the rss.xml file was not updated

All ZooBank is doing is scheduling a hit to this page once every 24 hours, so you can easily implement any refresh schedule you want just by hitting this link. Feel free to hit this link whenever you want to refresh the rss.xml file, and/or confirm that there is a problem updating that file. Also you can screen-scrape the RSS data, even if the rss.xml file is not being generated.

rdmpage commented 2 years ago

@deepreef

As far as I know, you are the only person who has ever used the ZooBank RSS feed, so I only ever learn when it stops working when I hear from you! :-)

I guess this either means I'm ahead of my time... or hopelessly behind the times, can't decide which. Many thanks for fixing this!

deepreef commented 2 years ago

I'm going with "ahead"!

Also, because you're pretty much our only customer on this, please let me know if I can tweak the service in any way. For example, it would be trivial for me to extend the look-back period as far as you want (weeks? months? years?). Of course, the longer the period, the more time to generate the file (possibly minutes, instead of seconds), and the larger the size of the file. Slightly less trivial, but certainly doable, would be to add a date parameter so you can choose how far back you want the content to extend (wouldn't work for the rss feed itself, but rather the call to generate the XML file). Or I could just turn it into an API that delivers JSON directly.

I can also update the frequency with which the xml file is updated (e.g., every hour?) -- but of course, you can set whatever schedule you want using the link I provided in the previous post. I can also tweak the content.

rdmpage commented 2 years ago

@deepreef I ran the RSS refresh and encountered a CFML Runtime Error (details below). I did this after I realised that the ZooBank RSS feed hadn't been changed since December 2nd 2021.

Type Application
Function(s) OnRequest (C:/jetty-openbd/webapps/zoobank/Application.cfc, Line=86, Column=3)
Detail Problem writing new file: C:\jetty-openbd\webapps\zoobank\rss\rss.xml. Check the path to the file is correct
Tag Context CFFILE (C:/jetty-openbd/webapps/zoobank/rssfeed.cfm, Line=48, Column=1)|+-- CFINCLUDE (C:/jetty-openbd/webapps/zoobank/Application.cfc, Line=90, Column=3)    |    +-- CFFUNCTION (C:/jetty-openbd/webapps/zoobank/Application.cfc, Line=86, Column=3)
Source 45: 46: 47: 48: 49: Completed.^ Snippet from underlying CFML source
deepreef commented 2 years ago

OK, thanks @rdmpage! It appears the lock was not as "errant" as I had hoped, as it seems to fail to overwrite the existing file (only creates the file if it's not there). I'm looking into it now.

As always, many thanks for bringing this to my attention!

deepreef commented 2 years ago

OK, so it seems that sometime around May of 2021, something changed on the server such that Java started locking the rss.xml file whenever anyone hit the link. That prevented it from being over-written by the new version. I spent a fair bit of time Googling this and ultimately failed to find a solution, so I cheated and am now exposing the rss file through a different web hosting service (IIS). This means a new URL, which is now: http://rss.zoobank.org. The rss.xml file is the default page on that URL, but you can also hit it explicitly with: http://rss.zoobank.org/rss.xml

Please give this a try, and if it works consistently with fresh updates in the coming days, then I'll retire the old URL (and maybe implement an automatic redirect).

Thanks again for bringing it to my attention.

rdmpage commented 2 years ago

@deepreef It’s working fine now, many thanks Rich!

On 7 Jan 2022, at 20:52, Richard L. Pyle @.***> wrote:

OK, so it seems that sometime around May of 2021, something changed on the server such that Java started locking the rss.xml file whenever anyone hit the link. That prevented it from being over-written by the new version. I spent a fair bit of time Googling this and ultimately failed to find a solution, so I cheated and am now exposing the rss file through a different web hosting service (IIS). This means a new URL, which is now: http://rss.zoobank.org http://rss.zoobank.org/. The rss.xml file is the default page on that URL, but you can also hit it explicitly with: http://rss.zoobank.org/rss.xml http://rss.zoobank.org/rss.xml Please give this a try, and if it works consistently with fresh updates in the coming days, then I'll retire the old URL (and maybe implement an automatic redirect).

Thanks again for bringing it to my attention.

— Reply to this email directly, view it on GitHub https://github.com/rdmpage/biorss/issues/4#issuecomment-1007733001, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAUK2WYKKETNGSNF5OOPBLUU5HCFANCNFSM5JHC3FFA. You are receiving this because you were mentioned.