Enable ODF-scan format in CAYW

adam3smith commented 9 years ago

I've started testing the CAYW feature and it's working very nicely for me together with AutoKey on linux. I'll blog on that when I get a chance.

It would be wonderful if you could add ODF-scan as a third format option in addition to mmd and pandoc to allow users to use this with our tool e.g. on google docs or with Scrivener if they want live Zotero citations.

The format would be: {prefix|human readable citation|locator_label locator|suffix|unique identifier} where human readable citation is a string that will get overwritten by the citation style, but should allow the user to identify the citation. We just use author date but if need be citekey would do.

the locator is preceded by a locator_label like p. and a space -- full list under "Locators" here

The unique identifier is, for items from personal libraries: zu:libraryID:itemID for items from groups zg:groupID:itemID Not having looked at your code, I'd hope that you could just recycle our (translator) code to generate the unique ID from an item's URI: https://github.com/Zotero-ODF-Scan/zotero-odf-scan/blob/master/plugin/resource/translators/Scannable%20Cite.js#L58

What do you think? (cc @fbennett )

retorquere commented 9 years ago

Should be fairly easy, if you can clarify a few things for me.

One thing I don't understand from the translator is the libraryID part; the way I understand it, a reference is either in the personal library (with library ID "falsish"; 0, null or undefined), or it is in a group library with that ID; group IDs are actually just library IDs. Looking at the code you linked to, you expect there to be non-group libraries with a non-zero ID, but I can't map that onto the Zotero database schema as I know it. I could easily call your translator, but I'd have no (clean) way to pass the picker data. I have access to all the reference data, so I can just pick the required data from the DB directly.
Why do you use the libraryID? The itemID is globally unique across any given Zotero install (not across installs, so not sync-safe)
I can do pretty much anything you want for the human readable part, so Author Date is not a problem. This is your preference then?

For google docs it should in principle be possible to call the picker URL directly and do real integration, including paste, but my GScript-FU is weak, so I don't know for sure.

adam3smith commented 9 years ago

Cool, let me answer that in order 2,1, 3: 2) We use the library ID because it's part of item.uri, which is what the Zotero LO (and Word) add-on uses to link citations to Zotero items. On scan, we then generate a Zotero Reference Mark with the item.uri and when the use then loads the doc in LO and picks a citation style, Zotero calls up that item. Does that make sense?

1) After the first sync, the your library gets assigned a global ID, which becomes part of the item's URI (before the hand that's just "local" in the URI). It's the same library ID you'd use for API requests. I'd think you should be able to just extract it from item.uri, which you should have access to from any Zotero item object as well as from whatever data the picker returns to you (since the picker would include it when used in LO).

3) What we do is to substitute the short title/title if an item has no creator and print "no date" where an item has no date. That'd be nice to do and will produce something that makes sense in all cases I can think of. @fbennett also has some additional logic for legal citations in there, which I'm sure he'd be delighted to have included -- I figure the easiest is for you to just read/copy that from the ScannableCite translator code. But as I said, that part of the cite is just for human-readability before the scan, so if you don't want to bother with the details, that'd be fine, too.

retorquere commented 9 years ago

I have a test version up at http://tempsend.com/FFB0203852, code is at https://github.com/ZotPlus/zotero-better-bibtex/tree/scannable-cite, format is scannable-cite. I'll have a look at the human readable label, and I'll seriously have to look into that library ID thing -- I currently have no idea where it lives in the Zotero DB, which is where I have to take it from.

retorquere commented 9 years ago

The actual code for new formats is in fact fairly simple -- just some 10 lines at https://github.com/ZotPlus/zotero-better-bibtex/blob/scannable-cite/chrome/content/zotero-better-bibtex/cayw.coffee#L133

adam3smith commented 9 years ago

and item.uri doesn't return anything for you?

retorquere commented 9 years ago

that is only set inside the translators.

adam3smith commented 9 years ago

had a first looks. Remaining issues:

Forgot about this: checking "suppress author" should put a minus (or technically hyphen, i.e. -) sign at the beginning of the human readable citation {|-Smith 1776|||zu:123:12314}
The itemID is the wrong one, too: we need the 8 digit alphanumeric one, the same one used e.g for reports. Right now this produces the sequential number of the item (like 345)

When I look at Reference Marks in LibreOffice, I get something like this: "citationItems":[{"id":4480,"uris":["http://zotero.org/users/2433/items/2E869MQK"],"uri":["http://zotero.org/users/2433/items/2E869MQK"],"itemData":{"id":4480,"type":"article-newspaper","title":"עושים לנו טובה גדולה","container-title":"The Marker","language":"he","author":[{"family":"מרב מיכאלי","given":""}],"issued":{"date-parts":[["2011",5,16]]}}}],"schema":"https://github.com/citation-style-language/schema/raw/master/csl-citation.json"} RNDO1GFPvHVVK

Are you not getting those uri(s) returned from the picker? Those have the data we'd need. I this case the 2433 (my personal libraryID) and 2E869MQK (the itemID).

retorquere commented 9 years ago

I think I've found something; could you try http://tempsend.com/7B18CFCAD0 ? I think I now also understand why you use the library ID -- you want these citations to be stable across users whenever possible. That makes me also think you're using the item key rather than the itemID That assumption has been baked into the new version on tempsend.

retorquere commented 9 years ago

We keep crossing messages :) that confirms the key vs itemID.

retorquere commented 9 years ago

The translators get neutered pseudo-objects, very different from code that lives outside the translators. "Suppress author" is now in http://tempsend.com/89B826663B

retorquere commented 9 years ago

The picker doesn't return very much. What it returns looks like

{"citationItems":[{"id":"5","label": "line", "locator":"1","prefix":"see","suffix":"and others","suppress-author":true}],"properties":{}}

All the rest I pick from the DB using this data.

retorquere commented 9 years ago

More questions:

What is a "short title"? I can do title easily.
what do you want if the date is filled in but non-parsable (like "forthcoming") or doesn't return a year (I've seen references with only a month -- no idea why, but they're out there)?

adam3smith commented 9 years ago

OK, I've given it a test spin, working flawlessly all the way through the process.

For your questions:

Short Title is a field in Zotero, towards the bottom. Users can fill it manually and Zotero automatically populates it with the title up to the first : ? or ! (I believe). It's convenient because it's, well, shorter, so the citation marker doesn't take up as much space. The field name should just be shortTitle
Since we wrote this as a translator and item.date in translators returns parsed dates, our scannable cites have those as "no date" and we've never had a complaint. That seems the easiest.

retorquere commented 9 years ago

(turns out I do have access to the uri, but it just includes the userID if I read the code right, and that's the number I'm after. But I'd rather have verification than trust my code-reading skills)

retorquere commented 9 years ago

Aha! OK, so if I understand correctly, the human readable part should be label year, where label = shorttitle | title | author, first found, and year is either the parsed year or "no date". Right?

retorquere commented 9 years ago

And just in case I got it right: http://tempsend.com/09D7B31421

adam3smith commented 9 years ago

almost: the order for label should be author | shortTitle | title otherwise exactly what you say.

retorquere commented 9 years ago

I don't think the items in the translator return parsed dates. They return whatever the "date" field holds; Zotero offers a function to parse them. Looking at the scannable cite translator I now see it does year | date | no date, first found.

New version that has both these changes at http://tempsend.com/07F03B858A

retorquere commented 9 years ago

I've looked at the scannable cite translator for the legal stuff, but could someone do me a broad walkthrough?

retorquere commented 9 years ago

Ah, I think I understand most of it -- I'll just have a stab at it in the morning.

adam3smith commented 9 years ago

FWIW, for legal item types (the ones listed at the top, some of them only exist in Frank's juris-m (MLZ) fork, same for some of the fields below), use authority, volume reporter pages and precede the date with court, Also, don't do any of the replacing with anon/no title/no date stuff. I'm not enirely clear on the rationale for these, it's somehow related to the bizarre world of legal citations and how Frank solves some of the odder rules there.

retorquere commented 9 years ago

Allright. The latest version at https://github.com/ZotPlus/zotero-better-bibtex/blob/scannable-cite/chrome/content/zotero-better-bibtex/cayw.coffee#L148 makes it possible for others to plug in cayw formatters -- if one defines Zotero.BetterBibTeX.CAYW.Formatter.scannableCite as a function that accepts the picks, it can return whatever string it wants. I'll be happy to add scannable cite, but if you'd rather fiddle with code yourself you could do that in any other plugin should you so desire (and while coffee script happens to be my preference, javascript will do just fine. Just make sure you create the function in init, so BBT has loaded.

adam3smith commented 9 years ago

This is excellent. I'd be very happy if you could just add it -- as I've said, the human readable part is actually the least important bit and everything else looks perfect the way it does.

Thanks for making this happen so quickly. Let me know when you release the version with this included.

retorquere commented 9 years ago

I'll add it when I get home, and then cut a release. I'm happy to have the scannable-cite picker included, I just wanted to leave the option open that @fbennett wanted to tweak the code to his liking. The hooks stay, so it can be done at any time. Pretty happy to see more use of this.

retorquere commented 9 years ago

1.2.25 will have the scannable cite translator.

retorquere commented 9 years ago

BTW I could just use citeproc to assemble the label like I do for the atom picker: https://github.com/ZotPlus/zotero-better-bibtex/blob/master/chrome/content/zotero-better-bibtex/cayw.coffee#L257

retorquere / zotero-better-bibtex

Enable ODF-scan format in CAYW #312