zotero / translation-server

A Node.js-based server to run Zotero translators
Other
121 stars 50 forks source link

add import support #89

Closed retorquere closed 5 years ago

dstillman commented 5 years ago

Thanks! This is great. I'll look at this more later, but could you add a few tests? See export_test.js for an example.

retorquere commented 5 years ago

Sure thing.

retorquere commented 5 years ago

Does mocha automatically run anything in <something>_test.js file?

retorquere commented 5 years ago

Oh wait I see, it's in the package.json

retorquere commented 5 years ago

So -- I've added tests, but the majority of them seem to "fail successfully". I may very well be missing something, but the spot checks I've done (e.g. on RIS.js) show a mismatch between what is in the input and what is in the items, and if I manually import the input into Zotero, it looks like the import in translation server does it right, and the items in e.g. RIS.js are wrong.

retorquere commented 5 years ago

(also, having debug logging on during import makes it very hard to follow what's going on, imports are very chatty and run through the mocha logging)

retorquere commented 5 years ago

I see tags being imported that look like

 {
    "tag": {
      "uri": "http://dewey.info/class/320.512092/e22/",
      "value": "http://dewey.info/class/320.512092/e22/"
    }
  },

should I just ignore these? The test case items doesn't have them; it comes from importing

<?xml version="1.0" encoding="utf-8" ?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
         xmlns:bibo="http://purl.org/ontology/bibo/"
         xmlns:dc="http://purl.org/dc/terms/"
         xmlns:owl="http://www.w3.org/2002/07/owl#"
         xmlns:dc11="http://purl.org/dc/elements/1.1/"
         xmlns:ns0="http://rdaregistry.info/Elements/u/"
         xmlns:ns1="http://iflastandards.info/ns/isbd/elements/"
         xmlns:foaf="http://xmlns.com/foaf/0.1/">

  <bibo:Book rdf:about="http://d-nb.info/1054873992">
    <dc:medium rdf:resource="http://rdaregistry.info/termList/RDACarrierType/1044"/>
    <owl:sameAs rdf:resource="http://hub.culturegraph.org/resource/DNB-1054873992"/>
    <dc11:identifier>(DE-101)1054873992</dc11:identifier>
    <dc11:identifier>(OCoLC)888461076</dc11:identifier>
    <bibo:isbn13>9783658060268</bibo:isbn13>
    <ns0:P60521>kart. : ca. EUR 39.99 (DE), ca. EUR 41.11 (AT), ca. sfr 50.00 (freier Pr.)</ns0:P60521>
    <bibo:isbn10>3658060263</bibo:isbn10>
    <bibo:gtin14>9783658060268</bibo:gtin14>
    <dc:language rdf:resource="http://id.loc.gov/vocabulary/iso639-2/ger"/>
    <dc11:title>Das Adam-Smith-Projekt</dc11:title>
    <dc:creator rdf:resource="http://d-nb.info/gnd/136486045"/>
    <dc11:publisher>Springer VS</dc11:publisher>
    <ns0:P60163>Wiesbaden</ns0:P60163>
    <ns0:P60333>Wiesbaden : Springer VS</ns0:P60333>
    <ns1:P1053>447 S.</ns1:P1053>
    <dc:isPartOf>Edition Theorie und Kritik</dc:isPartOf>
    <ns0:P60489>Zugl. leicht überarb. Fassung von: Berlin, Freie Univ., Diss., 2012</ns0:P60489>
    <dc:relation rdf:resource="http://d-nb.info/1064805604"/>
    <dc:subject>Smith, Adam</dc:subject>
    <dc:subject>Liberalismus</dc:subject>
    <dc:subject>Rechtsordnung</dc:subject>
    <dc:subject>Foucault, Michel</dc:subject>
    <dc:subject>Macht</dc:subject>
    <dc:subject>Politische Philosophie</dc:subject>
    <dc:subject rdf:resource="http://dewey.info/class/320.512092/e22/"/>
    <dc:tableOfContents rdf:resource="http://d-nb.info/1054873992/04"/>
    <dc:issued>2015</dc:issued>
    <ns0:P60493>zur Genealogie der liberalen Gouvernementalität</ns0:P60493>
  </bibo:Book>

  <foaf:Person rdf:about="http://d-nb.info/gnd/136486045">
    <foaf:familyName>Ronge</foaf:familyName>
    <foaf:givenName>Bastian</foaf:givenName>
  </foaf:Person>

</rdf:RDF>

which comes from RDF.js, but which in my tests with desktop Zotero seems to get picked up by Bibliontology RDF, which errors out in the desktop client when I import this from clipboard.

retorquere commented 5 years ago

In light of the discussion above, I'm just keeping these, mocha will report them as a test failure.

retorquere commented 5 years ago

The majority of tests failing was on account of me only saving the results of the last call to saveItems. That's fixed now.

dstillman commented 5 years ago

OK, this is looking pretty good. It seems like we're mostly just importing more fields now than we were when the tests were generated, right?

The RDF tag thing isn't great, but we'll look at that in https://github.com/zotero/translators/issues/1904.

retorquere commented 5 years ago

Huzzah!

Looks like it, yes, and also that spacing in field values was normalized at some point. Also two tests that have JSON typos so those fail just because I can't tell what's in them.

retorquere commented 5 years ago

(I have disabled Debug.init(1) locally -- with debug logging on it's nigh on impossible to make sense of the test case statuses)

dstillman commented 5 years ago

Thanks!

dstillman commented 5 years ago

I've temporarily disabled import testing until we fix the import tests.

retorquere commented 5 years ago

I'd be happy to work on resolving some of those.

dstillman commented 5 years ago

That'd be great — thanks!

retorquere commented 5 years ago

I keep thinking about the issue with the valid fields and basefields we discussed above -- without proposing a new change right now, here's the idea behind the list I had earlier.

The way I had things in my mind, the translators take one source of referencable items (be it Zotero or the "world") and translate it to the other. You verify the translation by looking at whether the results are what you intended.

For export formats, a user would look at the exported text, and see whether that matches expectations. The exported text is the end product. But for an import format, it seems to me this isn't the case -- the end product is an item in Zotero. So if an importer does its work without throwing errors, but produces only fields which Zotero will discard on save, it appears to me that the import translator would have failed, and I would have expected the corresponding test to fail, too. The Web importer (and therefore the tests I suppose) already does something like this -- its _itemDone handler does things like ISBN normalization and a few other things.