zotero / translators

Zotero Translators
http://www.zotero.org/support/dev/translators
1.24k stars 748 forks source link

Mimas websites #776

Open aurimasv opened 10 years ago

aurimasv commented 10 years ago

Mimas develops a lot of websites for UK libraries, including Historical Texts, copac, and JournalArchives, among many others. There might be some common architecture to these websites that we could take advantage of to develop a single (or a couple) translator(s). Unfortunately, I don't have access to many of these sites.

This might be useful http://mimas.ac.uk/expertise/linked-data/

Inspired by this tweet

dfaligertwood commented 10 years ago

Seeing as I appear to have started this... The websites don't currently have a common architecture (see this tweet), but they're working towards having JournalArchives and Historical Texts on the same platform (see this tweet). I'm assuming they're planning on bringing JournalArchives to Historical Texts' platform, as it is considerably newer (released a few months ago, iirc.)

HistoricalTexts has .ris and .pdf exports, which happen by some javascript magic that I've not been able to decode as yet (never done javascript in any meaningful way, and the source is minified, so it's quite a steep learning curve...!). I don't really know how the translator architecture works --- would it be preferable to use the data in the .ris file, or to scrape it from the website?

aurimasv commented 10 years ago

RIS is certainly an easy route to take, assuming that the metadata quality is good and that you can get to it (this can be complicated to figure out, but it's almost always doable). Other alternatives might be embedded metadata in the <head> tag of the page, but then you should already be able to import articles using the Embedded Metadata translator. Unfortunately, I can't take a look at their website, since I don't have access.

dfaligertwood commented 10 years ago

The download URL I'm getting is blob:(UUID) which means I can't just use a regex as some other translators do, unfortunately. There's a DOM object "JHBBookHeaderService.prototype.exportMetadata" that looks hopeful and appears to call a function "this.$rootScope.publication.exportRIS()" and then puts it into a blob, which I assume is what is being downloaded, but poking at it with the FireBug console doesn't appear to be returning anything useful. Will keep working at it.

aurimasv commented 10 years ago

Try monitoring network traffic. You might be able to figure out how to construct the HTTP request from data on the page.

dfaligertwood commented 10 years ago

If I inject the following code:

rootElement = document.querySelector('#ng-app')
rootScope = angular.element(rootElement).injector().get('$rootScope')

then:

rootScope.publication.exportRIS()

returns a UTF-8 string of the RIS data, and

'https://data.historicaltexts.jisc.ac.uk' + rootScope.publication.pdf

returns the URL of the pdf.

adam3smith commented 2 years ago

historical texts and copac are now supported; I don't have access to Journal Archives. Would be easy to do with it (as that now does follow basically the same site as ht)

AbeJellinek commented 2 years ago

Yeah, I'd love to work on this but it looks like it's restricted to UK institutions.