zotero / translators

Zotero Translators
http://www.zotero.org/support/dev/translators
1.19k stars 743 forks source link

Add Translator for TinRead OPACs #3124

Open adam3smith opened 10 months ago

adam3smith commented 10 months ago

Requested here: https://forums.zotero.org/discussion/107275/support-for-the-tinread-library#latest Has easily accessible MARC, so shouldn't be heard. Examples:

https://opac.biblioteca.ase.ro/opac/bibliographic_view/144193?pn=opac%2FSearch&q=gheorghe+carstea#level=all&location=0&ob=asc&q=gheorghe+carstea&sb=relevance&start=0&view=CONTENT

https://tinread.biblioteca.ct.ro/opac/bibliographic_view/238969?pn=opac/Search&q=educatie+fizica#level=all&location=0&ob=asc&q=educatie+fizica&sb=relevance&start=0&view=CONTENT

https://catalog.ucv.ro/opac/bibliographic_view/68938?pn=opac/Search&q=educatie+fizica#level=all&location=0&ob=asc&q=educatie+fizica&sb=relevance&start=0&view=CONTENT

http://tinread.bjbn.ro:8080/opac/bibliographic_view/36090?pn=opac/Search&q=metodica+fotbal#level=all&location=0&ob=asc&q=metodica+fotbal&sb=relevance&start=0&view=CONTENT

https://tinread.upit.ro/opac/bibliographic_view/37902?pn=opac/Search&q=metodica+educatie+fizica#level=all&location=0&ob=asc&q=metodica+educatie+fizica&sb=relevance&start=0&view=CONTENT

brendan-oconnell commented 9 months ago

@adam3smith I've taken a look at this, and the MARC, while easily accessible, is tag-structured in HTML in a way that that makes it difficult to write a querySelectorAll. Most catalogs with MARC put it in tables that can be read as a tree structure, but here's an example "row" from TinRead:

` 100 0

      <b>$6</b>&nbsp;
      137697

      <b>$a</b>&nbsp;
      Carstea, Gheorghe

      <b>$u</b>&nbsp;
      Academia de Studii Economice din Bucuresti. Facultatea de Management, Departamentul de Management

      <br/> `

Since it just uses line breaks and the data isn't structured in rows, I can't figure out how to easily select it, without resorting to writing loops.

On the other hand, the "Etichetat" view (first tab on the catalog page) is nicely structured in a table, but isn't MARC, so would require writing more lines of code and not taking advantage of the MARC translator. Which approach do you recommend?

AbeJellinek commented 8 months ago

Maybe something like:


let root = doc.querySelector('#marc li');
for (let child of root.childNodes) {
    if (child.nodeType === Node.ELEMENT_NODE && child.tagName == 'B') {
        if (child.textContent.startsWith('$') {
            let subtag = child.textContent;
            // do something with the subtag
        }
        else {
            let tag = child.textContent;
            // do something with the tag
        }
    }
    else {
        let content = child.textContent;
        // this is the content of the last subtag - do something with it
    }
}
franklindyer commented 5 months ago

It looks like the MARC can actually be exported in the MARCXML format by clicking the Exportă button (which has id exportBibs). This brings up a form allowing you to select the XML format and download the info as a .xml file. I've tested out one of these XML files with MARCXML.js and it seems to parse just fine.

Maybe I'll try writing a translator to grab the XML from the export button and then defer to the MARCXML translator?

franklindyer commented 5 months ago

Sure enough, the following works for me:

function doWeb(doc, url) {
    doc.getElementById("DirectLink").click();
    let exportButton = doc.getElementById("exportBibs");
    let marcUrl = exportButton.href;

    ZU.doGet(marcUrl, function(result) {
        var translator = Zotero.loadTranslator("import");
        translator.setTranslator("edd87d07-9194-42f8-b2ad-997c4c7deefd");
        translator.setString(result);
        translator.setHandler("itemDone", function (obj, item) {
            finalize(doc, item);
            item.complete();
        });
        translator.translate();
    });
}

I'll see if I can get detectWeb written as well, and open a pull request.

adam3smith commented 5 months ago

Have a look at the template functions in Scaffold/Translator Editor. New translators should use async functions and the async requestText commands to load the text -- this is actually easier to read&code once you've seen the syntax because you don't have to keep track of that callback anymore. Beyond that, yes a PR for this would be great.

franklindyer commented 5 months ago

Got it! Went ahead and opened a PR. Right now all of my test cases pass (though they can be kind of hit-or-miss due to a race condition), and linter tests also pass. Let me know if there's anything else I need to fix here!

The fact that these functions are async now is handy, it actually solved a problem I was having (needing to wait for a dialog box to open before getting the MARCXML link). Though I still wonder whether there's a better way of getting this link.