Open adam3smith opened 10 months ago
@adam3smith I've taken a look at this, and the MARC, while easily accessible, is tag-structured in HTML in a way that that makes it difficult to write a querySelectorAll
. Most catalogs with MARC put it in tables that can be read as a tree structure, but here's an example "row" from TinRead:
` 100 0
<b>$6</b>
137697
<b>$a</b>
Carstea, Gheorghe
<b>$u</b>
Academia de Studii Economice din Bucuresti. Facultatea de Management, Departamentul de Management
<br/> `
Since it just uses line breaks and the data isn't structured in rows, I can't figure out how to easily select it, without resorting to writing loops.
On the other hand, the "Etichetat" view (first tab on the catalog page) is nicely structured in a table, but isn't MARC, so would require writing more lines of code and not taking advantage of the MARC translator. Which approach do you recommend?
Maybe something like:
let root = doc.querySelector('#marc li');
for (let child of root.childNodes) {
if (child.nodeType === Node.ELEMENT_NODE && child.tagName == 'B') {
if (child.textContent.startsWith('$') {
let subtag = child.textContent;
// do something with the subtag
}
else {
let tag = child.textContent;
// do something with the tag
}
}
else {
let content = child.textContent;
// this is the content of the last subtag - do something with it
}
}
It looks like the MARC can actually be exported in the MARCXML format by clicking the Exportă
button (which has id exportBibs
). This brings up a form allowing you to select the XML format and download the info as a .xml
file. I've tested out one of these XML files with MARCXML.js
and it seems to parse just fine.
Maybe I'll try writing a translator to grab the XML from the export button and then defer to the MARCXML translator?
Sure enough, the following works for me:
function doWeb(doc, url) {
doc.getElementById("DirectLink").click();
let exportButton = doc.getElementById("exportBibs");
let marcUrl = exportButton.href;
ZU.doGet(marcUrl, function(result) {
var translator = Zotero.loadTranslator("import");
translator.setTranslator("edd87d07-9194-42f8-b2ad-997c4c7deefd");
translator.setString(result);
translator.setHandler("itemDone", function (obj, item) {
finalize(doc, item);
item.complete();
});
translator.translate();
});
}
I'll see if I can get detectWeb
written as well, and open a pull request.
Have a look at the template functions in Scaffold/Translator Editor. New translators should use async functions and the async requestText
commands to load the text -- this is actually easier to read&code once you've seen the syntax because you don't have to keep track of that callback anymore. Beyond that, yes a PR for this would be great.
Got it! Went ahead and opened a PR. Right now all of my test cases pass (though they can be kind of hit-or-miss due to a race condition), and linter tests also pass. Let me know if there's anything else I need to fix here!
The fact that these functions are async
now is handy, it actually solved a problem I was having (needing to wait for a dialog box to open before getting the MARCXML link). Though I still wonder whether there's a better way of getting this link.
Requested here: https://forums.zotero.org/discussion/107275/support-for-the-tinread-library#latest Has easily accessible MARC, so shouldn't be heard. Examples:
https://opac.biblioteca.ase.ro/opac/bibliographic_view/144193?pn=opac%2FSearch&q=gheorghe+carstea#level=all&location=0&ob=asc&q=gheorghe+carstea&sb=relevance&start=0&view=CONTENT
https://tinread.biblioteca.ct.ro/opac/bibliographic_view/238969?pn=opac/Search&q=educatie+fizica#level=all&location=0&ob=asc&q=educatie+fizica&sb=relevance&start=0&view=CONTENT
https://catalog.ucv.ro/opac/bibliographic_view/68938?pn=opac/Search&q=educatie+fizica#level=all&location=0&ob=asc&q=educatie+fizica&sb=relevance&start=0&view=CONTENT
http://tinread.bjbn.ro:8080/opac/bibliographic_view/36090?pn=opac/Search&q=metodica+fotbal#level=all&location=0&ob=asc&q=metodica+fotbal&sb=relevance&start=0&view=CONTENT
https://tinread.upit.ro/opac/bibliographic_view/37902?pn=opac/Search&q=metodica+educatie+fizica#level=all&location=0&ob=asc&q=metodica+educatie+fizica&sb=relevance&start=0&view=CONTENT