retorquere / zotero-better-bibtex

Make Zotero effective for us LaTeX holdouts
https://retorque.re/zotero-better-bibtex/
MIT License
5.37k stars 288 forks source link

Can BBT use ODF scan references as references? #1983

Closed paultroop closed 3 years ago

paultroop commented 3 years ago

Zotero version: 5.0.96.3 BBT version: 5.6.8 Your question/suggestion:

I have a very large number of references in the ODF hyperlink format like this:

{ | Smith, 2011 | | |zotero://select/items/0_ABCDEFGH}

Where ABCDEFGH is Zotero's unique identifier which comes up in the BBT field as follows:

"uri": "http://zotero.org/users/123456/items/ABCDEFGH",

I am wondering whether there is a feasible way to recognise them using BBT?

retorquere commented 3 years ago

which comes up in the BBT field as follows:

which BBT field?

I am wondering whether there is a feasible way to recognise them using BBT?

I'm not sure what you mean; recognize where? What would the workflow be, and how does BBT fit into it?

paultroop commented 3 years ago

When I export a bibliography from Zotero using BBT, each item has a number of fields. It seems that the field labelled 'uri' contains the Zotero unique identifier which is the same as the ODF scan link reference that I have been using for years. I was therefore hoping that this could somehow be used to cite using Bibtex rather than just ODF scan.

My current workflow is: I have lots of research notes collected over many years, with each source identified using the ODF scan marker as set out above. When I write, if I want to reference a point from my notes, I copy the ODF scan marker from my research notes to the point in the document that I am drafting and I want a reference to appear. When I have finished writing, ODF scan will convert my plain text document to an ODF file with 'live' Zotero references.

However, this is a bit of a pain, as I end up using 3 or more programs to word process: Scrivener, LibreOffice, and Word. I was hoping that I could just do it all with Latex.

My desired workflow would be to somehow convert the existing ODF scannable cites to a form that bibtex/latex will recognise. The workflow would be something like: use my notes with the existing ODF scannable cites to put in the references when I write my latex / plain text document. Converting these cites to a form that would be recognised by bibtex / latex (for example by stripping off or replacing the redundant fields / text), then processing the document to produce the desired output (pdf, docx, etc).

retorquere commented 3 years ago

When I export a bibliography from Zotero using BBT, each item has a number of fields. It seems that the field labelled 'uri' contains the Zotero unique identifier which is the same as the ODF scan link reference that I have been using for years.

You can add this to the item with a postscript if you want.

I was therefore hoping that this could somehow be used to cite using Bibtex rather than just ODF scan.

That's not something BBT is involved in. you'd have to ask the authors of biber etc whether they can use alternatives to the regular citation keys; BBT produces bib(la)tex, which is consumed at a later stage by the latex compilation pipeline. By the time the actual bibliography is being produced, BBT is not involved.

My desired workflow would be to somehow convert the existing ODF scannable cites to a form that bibtex/latex will recognise. The workflow would be something like: use my notes with the existing ODF scannable cites to put in the references when I write my latex / plain text document. Converting these cites to a form that would be recognised by bibtex / latex (for example by stripping off or replacing the redundant fields / text), then processing the document to produce the desired output (pdf, docx, etc).

It is possible in a roundabout way; you use odf-scan as usual, and then use a special csl style to produce bibtex keys rather than in-text references. You could then use something like pandoc to convert that document to latex. It's a bit clunky, but people have been able to use this.

paultroop commented 3 years ago

Thanks very much for the helpful information.

Can I ask if it would be possible if the BBT settings could be changed so the format of the citekeys in the bib file it creates is not:

"citekey": "authorTitleYYYY",

but:

"citekey": "ABCDEFGH",

Where ABCEFGHI is the Zotero identifier that appears in the BBT export field of:

"uri": "http://zotero.org/users/123456/items/ABCDEFGH", ?

Then I could skip a step with the ODF scan and LibreOffice by changing the existing ODF references in my Latex file from:

{ | Smith, 2011 | | |zotero://select/items/0_ABCDEFGH}

to:

\cite{ABCDEFGH}

Before compiling.

paultroop commented 3 years ago

The Zotero identifier seems to refer to the attachment, and also appears in the BBT export field of:

"select": "zotero://select/library/items/ABCDEFGH",

retorquere commented 3 years ago

The key is not (IIRC) uniquely identifying; they are uniquely identifying within a library (your personal library, and groups you are a member of) but not across libraries; you can have a key (eg 2YPFQMML) that is both in your personal library and in a group you are a member of; that is the 0_ in front of the key, that means it is in your personal library. I have put out a request for clarification to the zotero developers.

That said, I have a build running that adds [item-key]. It should drop here shortly.

The Zotero identifier seems to refer to the attachment, and also appears in the BBT export field of:

Can you right-click an item and send a support log? Using placeholder terms like ABCDEFGH makes it very easy to misunderstand each other. Let's take one concrete item and talk about that and its key(s). If you get me a support log ID I get that item sent to me, if you then export it as BBT JSON, we know we are looking at the same item.

github-actions[bot] commented 3 years ago

:robot: this is your friendly neighborhood build bot announcing test build 5.6.8.1807 ("item-key function")

Install in Zotero by downloading test build 5.6.8.1807, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".

paultroop commented 3 years ago

The key is not (IIRC) uniquely identifying; they are uniquely identifying within a library (your personal library, and groups you are a member of) but not across libraries; you can have a key (eg 2YPFQMML) that is both in your personal library and in a group you are a member of; that is the 0_ in front of the key, that means it is in your personal library. I have put out a request for clarification to the zotero developers.

Yes, I believe you are right. But this is a separate issue, given that the existing research that I have already uses these personal Zotero keys.

That said, I have a build running that adds [item-key]. It should drop here shortly.

The Zotero identifier seems to refer to the attachment, and also appears in the BBT export field of:

Can you right-click an item and send a support log? Using placeholder terms like ABCDEFGH makes it very easy to misunderstand each other. Let's take one concrete item and talk about that and its key(s). If you get me a support log ID I get that item sent to me, if you then export it as BBT JSON, we know we are looking at the same item.

I've done that. In case it helps, this is the format that ODF scan produces:

{ | Aarnio, 2011 | | |zotero://select/items/1_UUMUPPTS}

This is the export of the same item that BBT produces:

{
  "ISBN": "978-94-007-1654-4 978-94-007-1655-1",
  "abstractNote": "According to the traditional view, in this treatise the doctrinal study of law is understood as a discipline, which has to (1) produce information about the law and (2) systematise the legal norms (Aarnio 1989a, 3). In doing this, DSL is one category of the legal sciences. There are, however, many other fields of legal research in which the notion of legal science is normally used. Historical study, the sociology of law, law and economics, and the comparative studies of law all belong to this category. They are legal sciences in the wide sense of the term.",
  "accessDate": "2016-11-17T19:35:41Z",
  "attachments": [
    {
      "dateAdded": "2016-11-17T19:35:43Z",
      "dateModified": "2016-11-17T19:35:43Z",
      "itemType": "attachment",
      "path": "/Users/---/Documents/Zotero Database/storage/5UPRSNZB/10.html",
      "relations": [],
      "tags": [],
      "title": "Snapshot",
      "uri": "http://zotero.org/users/409871/items/5UPRSNZB"
    }
  ],
  "citationKey": "aarnioWhatDoctrinalStudy2011",
  "citekey": "aarnioWhatDoctrinalStudy2011",
  "creators": [
    {
      "creatorType": "author",
      "firstName": "Prof Aulis",
      "lastName": "Aarnio"
    }
  ],
  "date": "2011",
  "dateAdded": "2016-11-17T19:35:41Z",
  "dateModified": "2016-11-17T19:35:41Z",
  "extra": "DOI: 10.1007/978-94-007-1655-1_3",
  "itemID": 3392,
  "itemKey": "UUMUPPTS",
  "itemType": "bookSection",
  "language": "en",
  "libraryCatalog": "link.springer.com",
  "libraryID": 1,
  "notes": [
    {
      "dateAdded": "2016-11-17T19:42:21Z",
      "dateModified": "2016-11-17T19:42:34Z",
      "itemType": "note",
      "key": "N5UMZQ9D",
      "note": "<p>---.</p>",
      "parentItem": "UUMUPPTS",
      "relations": {},
      "tags": [],
      "uri": "http://zotero.org/users/409871/items/N5UMZQ9D",
      "version": 1948
    }
  ],
  "pages": "19-24",
  "publicationTitle": "Essays on the Doctrinal Study of Law",
  "publisher": "Springer Netherlands",
  "relations": [],
  "rights": "©2011 Springer Science+Business Media B.V.",
  "select": "zotero://select/library/items/UUMUPPTS",
  "series": "Law and Philosophy Library",
  "seriesNumber": "96",
  "tags": [
    {
      "tag": "Political Science",
      "type": 1
    },
    {
      "tag": "Philosophy of Law",
      "type": 1
    },
    {
      "tag": "Theories of Law, Philosophy of Law, Legal History",
      "type": 1
    }
  ],
  "title": "What Is the Doctrinal Study of Law?",
  "uri": "http://zotero.org/users/409871/items/UUMUPPTS",
  "url": "http://link.springer.com/chapter/10.1007/978-94-007-1655-1_3",
  "version": 1947
},
retorquere commented 3 years ago

Build 1807 will return UUMUPPTS for [item-key], but I'm not yet certain I'm rolling it out to a release that way. I need to know how the item key relates to the library ID, and I could see ending up with an implementation where [item-key] returns the library name or ID in some way if it's not the personal library.

retorquere commented 3 years ago

This is the export of the same item that BBT produces:

This is not something BBT produces BTW, it's just a simplified version of the internal format that Zotero uses for all translators. BBT just returns that largely unchanged, which Zotero then writes to disk for you when you export.

paultroop commented 3 years ago

Build 1807 will return UUMUPPTS for [item-key], but I'm not yet certain I'm rolling it out to a release that way. I need to know how the item key relates to the library ID, and I could see ending up with an implementation where [item-key] returns the library name or ID in some way if it's not the personal library.

This is via the 'BBT Quick Copy: Zotero select link' default quick copy format?

retorquere commented 3 years ago

No, you said you wanted a way to set the citekeys to the item key. The quick copy output generator is something else entirely.

paultroop commented 3 years ago

Sorry, I'm struggling to see where the citekeys have changed after installation of Build 1807.

paultroop commented 3 years ago

Sorry, got it now - I change the citation ~key~pattern to [item-key]

retorquere commented 3 years ago

Wait, do you mean you want select links that have the citekey? Those already work. You can use @aarnioWhatDoctrinalStudy2011 or bbt:{1}aarnioWhatDoctrinalStudy2011 wherever you find that 1_UUMUPPTS.

retorquere commented 3 years ago

Sorry, I'm struggling to see where the citekeys have changed after installation of Build 1807.

They don't change, you have to both change the pattern to include [item-key] and refresh the key or change the item to trigger a key refresh.

paultroop commented 3 years ago

Yes, got it now! Sorry for being slow.

retorquere commented 3 years ago

No worries. So this is what you want then? Then I wait for feedback from the Zotero devs on key uniqueness.

paultroop commented 3 years ago

Something like this would be perfect. Yes, thank you so much for your help! Look forward to what the Zotero developers say regarding uniqueness.

retorquere commented 3 years ago

This does go against how latex users would typically use citekeys, so I'm always surprised by these requests. It does no harm so I can add it no issue, but for latex users, citekeys are meaningful text themselves, so a typical sentence would read

The private language argument argues that a language understandable by only a single individual is incoherent, and was introduced by @WittgensteinPI1953 in his later work...

and not

The private language argument argues that a language understandable by only a single individual is incoherent, and was introduced by @UUMUPPTS in his later work...

paultroop commented 3 years ago

Yes, I agree! I probably would not start from this point if I was beginning from scratch, but there you go...

I'm not sufficiently familiar with Bibtex to know the answer to this, but perhaps there is a way to create citekeys that have redundant text, ie @UUMUPPTS~Aarnio2011 or something. The ODF scans have a similar, but in a slightly different order:

{ | Aarnio, 2011 | | |zotero://select/items/1_UUMUPPTS}

retorquere commented 3 years ago

That would just be [item-key]~[auth][year]

paultroop commented 3 years ago

Yes, but is there a character / string that tells Bibtex to ignore the text after the ~ (that would go in place of the ~)?

Because the ODF cites that I will be converting will have the item key, but the author and year text formatting is likely to be unreliable, so best ignored.

retorquere commented 3 years ago

I'm not sure how citekeys could be unreliable, unless authorship / dates in your library itself are unreliable, in which case you have bigger problems than what your citekeys look like.

Given that latex is a Turing-complete programming language, there is undoubtedly a way to have it ignore parts of the citekey, but at that part you're fighting the latex toolstack. I wouldn't recommend it.

retorquere commented 3 years ago

I'm also not clear on what your workflow would look like if it were possible. Would BBT generate these, so that you could use them in the text you write? In that case, you can just use the pattern I showed. If BBT did not generate them, how would the ignored part get into your latex source? Manually typing them? Then you could just type a latex comment and achieve the same.

paultroop commented 3 years ago

The idea is accommodating the existing ODF references in a more user-friendly way, as you note above. The author / date field was never crucial for ODF scan, so the data in that field is not completely consistent. Sometimes there was a comma, sometimes I edited the authors to reduce the length of the field etc.

I was therefore wondering if there was a system whereby the BBT citationkey could accommodate this - ie something like the % sign whereby code appearing after it would be ignored on compile (but was still more user friendly when writing so that you could have an idea what the otherwise arbitrary series of letters and numbers referred to. That way the workflow could accommodate old ODF references and new BBT citations keys. And the output could be either a compiled Latex document or even an ODF format document using ODF scan.

I am not familiar with the Bibtex code, so don't know the answer to this. (And I probably should be directing the question elsewhere as I appreciate it is not your main focus and you have been extremely helpful in solving the main challenge!)

retorquere commented 3 years ago

The idea is accommodating the existing ODF references in a more user-friendly way, as you note above. The author / date field was never crucial for ODF scan, so the data in that field is not completely consistent. Sometimes there was a comma, sometimes I edited the authors to reduce the length of the field etc.

But you'd type this yourself? Or use drag and drop? Because if you type the bibtex yourself, you could type a latex comment there. If it's for drag and drop, then you're looking for a new quickcopy format, not key generation.

I was therefore wondering if there was a system whereby the BBT citationkey could accommodate this - ie something like the % sign whereby code appearing after it would be ignored on compile

But what good would that do you? It would sit in the bib file right above the actual identifying data. And you'd still not have it in your document source.

but was still more user friendly when writing so that you could have an idea what the otherwise arbitrary series of letters and numbers referred to.

But how would that get into your document? Generating keys that replicate the stuff that is always right under them serves no purpose I can identify.

That way the workflow could accommodate old ODF references and new BBT citations keys. And the output could be either a compiled Latex document or even an ODF format document using ODF scan.

I have no idea what this means. We're talking about an ODF document, right? Because otherwise ODF scan doesn't come into play. How is latex going to compile this?

Can you create a sample of such a document? I think we're talking way past each other here, but it sounds to me like you've chosen a solution before understanding the problem properly. Or at least not explained all of it. Let's take this step by step; create a sample document, attach it here and tell me what you want to do with it. There's again too many hypotheticals in the conversation.

paultroop commented 3 years ago

OK, let me try and break down how I would see the workflow given the existing setup.

I have an existing research note that looks like this:

Dennett- predicting people from the physical stance is hopeless

What then do we see when we look at this bustling public world? Among the most complicated and interesting of the phenomena are the doings of our fellow human beings. If we try to predict and describe them using the same methods and concepts we have developed to describe landslides, germination, and magnetism, we can make a few important inroads, but the bulk of their observable macroactivity—their ”behavior”—is hopelessly unpredictable from these perspectives. People are even less predictable than the weather, if we rely on the scientific techniques of meteorologists and even biologists.

{ | Dennett, (1993) |7| |zotero://select/items/0_NI9PUPSH}

My ODF Scan workflow would be to drag and drop the ODF reference into my writing like this:

Dennet says that an attempt to predict the day-to-day behaviour of people using the 'physical stance' is hopeless { | Dennett, (1993) |7| |zotero://select/items/0_NI9PUPSH}.

When I've finished writing, I save the .txt document as .odf, run the Zotero ODF scan, and get an output that is a proper Zotero 'live' citation where I can change the format from APA to whatever. It might look something like this:

Dennet says that an attempt to predict the day-to-day behaviour of people using the 'physical stance' is hopeless (Dennett, 1993, 7).

I then save as a .docx file as I find LibreOffice difficult for word processing (thereby losing the live reference) and set the document out in word. As you can imagine, this is a bit tiresome.

With your [item-key] option, this opens up a much easier workflow using Latex and Bibtex.

I can replace the text in my references in my draft so that they now look something like this:

Dennet says that an attempt to predict the day-to-day behaviour of people using the 'physical stance' is hopeless @NI9PUPSH.

Then I can use the .bib library I created from Zotero using BBT to compile using Bibtex. This is 95% of the problem solved.

The only outstanding (and minor) issue is that the citekey is now completely anonymous. I was therefore wondering if there was a way to generate more meaningful citekeys in a way that was not reliant on the exact text. Generating a citekey in the .bib library that would still accommodate slight differences in the citekeys created from the existing ODF scan references would be necessary I think. For example, the ODF scan references for the same item could be:

{ | Dennett, 1993 |7| |zotero://select/items/0_NI9PUPSH} { | Dennett (1993) |7| |zotero://select/items/0_NI9PUPSH} { | Dennett 1993 |7| |zotero://select/items/0_NI9PUPSH} { | Dennett and Smith, 1993 |7| |zotero://select/items/0_NI9PUPSH}

Imagine, for arguments sake, that the % sign means that text after it would be ignored on compile. This would mean that the BBT library could specify the citekey as:

@NI9PUPSH%Dennett1993

But I could still convert any preexisting ODF scan references in a way that would be recognised. Ie, they could be:

@NI9PUPSH%Dennett1993 @NI9PUPSH%Dennett(1993) @NI9PUPSH%Dennett,1993 @NI9PUPSH%DennettandSmith1993

retorquere commented 3 years ago

When I've finished writing, I save the .txt document as .odf, run the Zotero ODF scan, and get an output that is a proper Zotero 'live' citation where I can change the format from APA to whatever. It might look something like this:

I didn't know ODF scan could read text files; I can't get it to open a plain-text txt file even if I rename it to .odf.

I then save as a .docx file as I find LibreOffice difficult for word processing (thereby losing the live reference) and set the document out in word. As you can imagine, this is a bit tiresome.

You don't have to lose the live references: https://www.zotero.org/support/kb/moving_documents_between_word_processors. With that you could use this to convert those live citations to citekeys. Or without it in fact, you could just do the same using the ODF document. I'm really not clear on why you're putting Word in the mix if you want to move your source document to latex.

With your [item-key] option, this opens up a much easier workflow using Latex and Bibtex.

I can replace the text in my references in my draft so that they now look something like this:

Dennet says that an attempt to predict the day-to-day behaviour of people using the 'physical stance' is hopeless @NI9PUPSH. Then I can use the .bib library I created from Zotero using BBT to compile using Bibtex. This is 95% of the problem solved.

If the source document is ODF, I really don't know how you're going to pass it through latex. The only way I can make sense of this is if your source is markdown, and you're using pandoc to convert it. If that is the case, you can use the trick above to get the citekeys into the document before conversion.

The only outstanding (and minor) issue is that the citekey is now completely anonymous.

That may be true if the path you're sketching is the best way to get your desired result, but I still don't really know

  1. What is the source document format? ODF? Plain-text? Markdown?
  2. After it is ran through ODF scan, you by necessity have an ODF document, now with live references. Why do you put this through Word if you want LaTeX?
  3. Given that you now have either word without live references, or ODF with live references, what do you do to this document to feed it to latex?

I was therefore wondering if there was a way to generate more meaningful citekeys in a way that was not reliant on the exact text.

Man I'm not doing this to be difficult but I have no idea what you are asking here. Meaningful keys are the standard.

Generating a citekey in the .bib library that would still accommodate slight differences in the citekeys created from the existing ODF scan references would be necessary I think.

Still do not understand what "accommodate slight differences" means here. In what sense would the bib file accommodate differences? Bib files can accommodate pretty much anything that is plain text. You could put the tractatus in the citekey if you wanted that.

For example, the ODF scan references for the same item could be:

{ | Dennett, 1993 |7| |zotero://select/items/0_NI9PUPSH} { | Dennett (1993) |7| |zotero://select/items/0_NI9PUPSH} { | Dennett 1993 |7| |zotero://select/items/0_NI9PUPSH} { | Dennett and Smith, 1993 |7| |zotero://select/items/0_NI9PUPSH}

And the equivalent in latex could be either

\cite{NI9PUPSH}\vphantom{Dennet 1993}

or

\cite{@Dennet1993}\vphantom{NI9PUPSH}

nothing more fancy than an eat-paramaters command is needed for that.

Imagine, for arguments sake, that the % sign means that text after it would be ignored on compile. This would mean that the BBT library could specify the citekey as:

@NI9PUPSH%Dennett1993

Where. Where would this @NI9PUPSH%Dennett1993 appear? In the bib file? In the source document? Both?

You can get that key (sort of), but I still don't see how this helps you. These would be the keys that are in your bib file, and the name "Dennet" would always be right under that key. How would you use this key (or parts of this key? is that what you want? a partial key is not going to work) in your document is the question.

Can you please just prepare an actual bibfile as you would want it and an actual document as you would write it and attach them here rather than describing their characteristics. I keep having to guess what you mean. An actual document would take away any ambiguity. Just manually create them how you would want them to be, and I can tell you whether it's possible, but currently I still have no idea what you want in concrete terms.

But I could still convert any preexisting ODF scan references in a way that would be recognised. Ie, they could be:

@NI9PUPSH%Dennett1993 @NI9PUPSH%Dennett(1993) @NI9PUPSH%Dennett,1993 @NI9PUPSH%DennettandSmith1993

I'm sure they could be but I still don't understand what your workflow is. BBT can generate these keys just fine. BBT is most likely out of the loop at the point that you use them, so I don't know if I can help you, but I really still have only a very foggy notion of what you're concretely trying to do.

paultroop commented 3 years ago

I'll prepare a short document showing what I would like, but it appears I haven't communicated what I would like to do. I'll have another go.

My current workflow is:

plain text (with existing ODF scan references) > ODF format > word format > pdf format

I would like to get away from this. I do not want to use word, I do not want to use ODF, and I particularly do not want to waste time formatting text in word.

My desired workflow would be:

plain text, markdown, or Latex (adapting existing ODF scan references) > pdf

retorquere commented 3 years ago

Then why not just convert the text files? Replacing the odf scan markers to citekeys directly is trivial compared to what we were discussing so far.

paultroop commented 3 years ago

Yes, that's what I want to do, and your suggested addition of [item-key] solved the main problem perfectly.

The only (and relatively minor) query I had was whether there was a way of converting the existing odf scan markers to citekeys in a way that kept some meaning, but without being dependent on that meaning (given that the author / year field has never really been consistent as it never needed to be).

retorquere commented 3 years ago

If that helps you that is fine, but what I'm saying is that it would be simple to replace the odf scan markers directly with pandoc-compatible citation markers by having eg a python script picking them out, matching the item key zotero by way of live lookup, get the BBT citation key, and replace it with that. I could put that together easily, and there would be no manual work on your part.

paultroop commented 3 years ago

So in the way of an example, I start with a plain text file that looks like this:

Psychological science has advanced considerably since the early days of the 20th century when the psychological behaviourism was in the ascendancy { | Posner, (1995) |393| |zotero://select/items/0_55ZAN9GM}{ | Leiter, (1997) |311-12| |zotero://select/items/0_Z226U5KH}{ | Leiter, (2001) |282-83| |zotero://select/items/0_W82SWD2M}. Nonetheless, there are still only a handful of psychological theories of judicial adjudication { | Baum, 1997 | | |zotero://select/items/0_86DH33NA}{ | Simon, 1998 |4-6| |zotero://select/items/0_2K83TCAG}{ | Hirsch, 2002 |602 fn16| |zotero://select/items/0_A268Z7H2}{ | Richard A. Posner, 2008 |19| |zotero://select/items/0_C267KJ5V}. Those that do exist do not purport to account for every aspect of adjudication. This review necessarily interprets psychological theories of adjudication broadly. One such theory is the story model which purports to be a theory of jury decision making, but has elements that also speak to judicial adjudication. Another is positive law and economics which merits inclusion because of its reliance on rational choice theory { | Korobkin, & Ulen, (2000) |1055| |zotero://select/items/0_VH4DHKTB}. Because existing theories do not always speak to every stage of the adjudicatory process, the primary organising structure of this review will be the approximate chronology of a trial.

I could convert it to citekeys as follows and use BBT to create a .bib file from Zotero that could be used to compile it:

Psychological science has advanced considerably since the early days of the 20th century when the psychological behaviourism was in the ascendancy [@55ZAN9GM, p.393][@Z226U5KH, p.311-12][@W82SWD2M, p.282-83]. Nonetheless, there are still only a handful of psychological theories of judicial adjudication [@86DH33NA][@2K83TCAG, p.4-6][@A268Z7H2, p.602 fn16][@C267KJ5V, p.19]. Those that do exist do not purport to account for every aspect of adjudication. This review necessarily interprets psychological theories of adjudication broadly. One such theory is the story model which purports to be a theory of jury decision making, but has elements that also speak to judicial adjudication. Another is positive law and economics which merits inclusion because of its reliance on rational choice theory [@VH4DHKTB, p.1055]. Because existing theories do not always speak to every stage of the adjudicatory process, the primary organising structure of this review will be the approximate chronology of a trial.

The only issue is that none of the citekeys are meaningful, so I was wondering if I could leave some of the author year information in the citekeys in a way that was not dependent on total accuracy of the format, ie:

Psychological science has advanced considerably since the early days of the 20th century when the psychological behaviourism was in the ascendancy [@55ZAN9GM, p.393 %Posner,(1995)][@Z226U5KH, p.311-12 %Leiter,(1997)][@W82SWD2M, p.282-83 %Leiter,(2001)]. Nonetheless, there are still only a handful of psychological theories of judicial adjudication [@86DH33NA %Baum,1997][@2K83TCAG, p.4-6 %Simon,1998][@A268Z7H2, p.602 fn16 %Hirsh,2002][@C267KJ5V, p.19 %Posner2008]. Those that do exist do not purport to account for every aspect of adjudication. This review necessarily interprets psychological theories of adjudication broadly. One such theory is the story model which purports to be a theory of jury decision making, but has elements that also speak to judicial adjudication. Another is positive law and economics which merits inclusion because of its reliance on rational choice theory [@VH4DHKTB, p.1055 %Korobkin,(2000)]. Because existing theories do not always speak to every stage of the adjudicatory process, the primary organising structure of this review will be the approximate chronology of a trial.

paultroop commented 3 years ago

If that helps you that is fine, but what I'm saying is that it would be simple to replace the odf scan markers directly with pandoc-compatible citation markers by having eg a python script picking them out, matching the item key zotero by way of live lookup, get the BBT citation key, and replace it with that. I could put that together easily, and there would be no manual work on your part.

Ah, so something like a python script could take a plain text file containing existing ODF scannable cites and convert these to perfect citation markers (eg of the form [@Dennett2001] without needing the clunky Zotero item identifier. This had not occurred to me because my technical skills are very basic. I was planning on writing a simple python script just to automatically rearrange the existing ODF scannable cites into citation markers containing the Zotero item identifier.

paultroop commented 3 years ago

That would be an infinitely more elegant solution.

retorquere commented 3 years ago

Ah, so something like a python script could take a plain text file containing existing ODF scannable cites and convert these to perfect citation markers (eg of the form [@Dennett2001] without needing the clunky Zotero item identifier. This had not occurred to me because my technical skills are very basic. I was planning on writing a simple python script just to automatically rearrange the existing ODF scannable cites into citation markers containing the Zotero item identifier.

That is exactly what I meant. I can't finish this tonight, but I should be able to do this sunday, maybe tomorrow. It is a simple lookup, and the ODF markers are easy to pick out cleanly.

paultroop commented 3 years ago

That would be amazing.

retorquere commented 3 years ago
#!/usr/bin/env python3

import sys
import os
import re
import urllib.request
import json

marker = '[{]' + ('([^|{}]*)[|]' * 4) + '([^|{}]*)[}]'
def lookup(match):
  citations = []
  for citation in re.findall(marker, match.group(0)):
    prefix, label, locator, suffix, uri = [m.strip() for m in citation]
    libraryID, key = uri.split('/')[-1].split('_')
    libraryID = int(libraryID)
    if libraryID != 0:
      key = f'{libraryID}:{key}'
    payload = {'jsonrpc': '2.0', 'method': 'item.citationkey', 'params': [[ key ]] }
    req = urllib.request.Request('http://localhost:23119/better-bibtex/json-rpc', data=bytes(json.dumps(payload).encode("utf-8")), headers={'Content-Type':'application/json'})
    resp = json.loads(urllib.request.urlopen(req).read().decode('utf-8'))
    if ('result' in resp) and (citekey := resp['result'].get(key)):
      citation = ''
      if len(prefix) > 0: citation += f'{prefix} '
      citation += f'@{citekey}'
      if len(locator) > 0: citation += f', {locator}'
      if len(suffix) > 0: citation += f' {suffix}'
      citations.append(citation)
    else:
      print('could not find', uri)
      sys.exit(1)
  citations = '; '.join(citations)
  return f'[{citations}]'

for source in sys.argv[1:]:
  name, ext = os.path.splitext(os.path.basename(source))
  if ext != '.txt':
    print('source must be .txt, not', name + ext)
    sys.exit(1)
  target = os.path.join(os.path.dirname(source), name + '.md')
  if os.path.exists(target):
    print(target, 'exists, not overwriting')
    sys.exit(1)
  with open(source) as f:
    source = re.sub(f'({marker})+', lookup, f.read())
  print('writing', target)
  with open(target, 'w') as f:
    f.write(source)
retorquere commented 3 years ago

You can pass .txt files on the command line, and it will create corresponding .md files. If you pass a non-.txt file, it will complain and stop. If a corresponding .md file exists, it will complain and stop. If one of the markers in the file cannot be matched in the Zotero library, guess what, it will complain and stop.

Zotero needs to be running while this runs, and BBT needs to be installed obviously.

retorquere commented 3 years ago

The only issue is that none of the citekeys are meaningful, so I was wondering if I could leave some of the author year information in the citekeys in a way that was not dependent on total accuracy of the format, ie:

Psychological science has advanced considerably since the early days of the 20th century when the psychological behaviourism was in the ascendancy [@55ZAN9GM, p.393 %Posner,(1995)][@Z226U5KH, p.311-12 %Leiter,(1997)]

this was one of the things that was tripping me up. It wouldn't be latex that would have to ignore the %... part, it would be pandoc, and I didn't understand what situation you would have expected to have at this point; for this to work, the bib file would have to have something like

@article{55ZAN9GM,
  author = {Posner},
...
}
@article{Z226U5KH,
  author = {Leiter},
...
}

and either you or a script would have to add the %Posner to the markdown source. I still don't see how BBT could have been involved in any way in getting that %Posner into your markdown source given the workflow you describe.

paultroop commented 3 years ago

You can pass .txt files on the command line, and it will create corresponding .md files. If you pass a non-.txt file, it will complain and stop. If a corresponding .md file exists, it will complain and stop. If one of the markers in the file cannot be matched in the Zotero library, guess what, it will complain and stop.

Zotero needs to be running while this runs, and BBT needs to be installed obviously.

Fantastic, that is really amazing! Thank you very much. I look forward to giving this a go.

paultroop commented 3 years ago

and either you or a script would have to add the %Posner to the markdown source. I still don't see how BBT could have been involved in any way in getting that %Posner into your markdown source given the workflow you describe.

No, my thinking was just that BBT would provide the .bib file from Zotero that pandoc or a Latex file would use. I was envisaging changing the .txt file with the ODF cites in them using a simple (and somewhat dirty) script to rearrange them. But you have provided a very smart solution that avoids all the issues and challenges with that plan!

paultroop commented 3 years ago

It does not seem to like the first colon in:

if ('result' in resp) and (citekey := resp['result'][key]):
paultroop commented 3 years ago

Sorry, forgot python3

paultroop commented 3 years ago

Brilliant, it's working perfectly! Thank you every so much.

retorquere commented 3 years ago

This part is just curiosity. To be clear, I don't think there's anything here that's going to help in anyway with your pragmatic problem, which it looks like we can consider solved.

No, my thinking was just that BBT would provide the .bib file from Zotero that pandoc or a Latex file would use. I was envisaging changing the .txt file with the ODF cites in them using a simple (and somewhat dirty) script to rearrange them. But you have provided a very smart solution that avoids all the issues and challenges with that plan!

BBT can provide this bib file no problem. Let's assume it could be made to output

@article{55ZAN9GM\vphantom{Posner},
  author = {Posner}
  ...
}

or

@article{55ZAN9GM<!-- Posner -->,
  author = {Posner}
  ...
}

and that the bibtex/pandoc toolstack wouldn't choke on this. How does this help in the original discussion? Any script that could modify your markdown file with scannable cites could also inject any comments you like while it's doing that. I don't see how having these comments in the bib file would do you any good.

paultroop commented 3 years ago

I don't see how having these comments in the bib file would do you any good.

I agree, there would not have been much point having these comments in the bib file. I was more thinking of having them in some form in the citekeys in a way that would not throw pandoc/latex. I suppose it was more a question of the correct syntax to avoid the latter problem than getting them also to appear in the .bib file.

retorquere commented 3 years ago

I agree, there would not have been much point having these comments in the bib file. I was more thinking of having them in some form in the citekeys in a way that would not throw pandoc/latex. I suppose it was more a question of the correct syntax to avoid the latter problem than getting them also to appear in the .bib file.

In pandoc you can't have them inside the quotation, but outside that, <-- ... --> is the way to add comments to the document that don't show up in the output.

paultroop commented 3 years ago

Thanks!