retorquere / zotero-better-bibtex

Make Zotero effective for us LaTeX holdouts
https://retorque.re/zotero-better-bibtex/
MIT License
5.27k stars 284 forks source link

[Feature]: Retain necessary punctuation when abbreviating proceeding titles #2481

Closed aywi closed 1 year ago

aywi commented 1 year ago

Debug log ID

27XLUKGT-refs-apse

What happened?

It looks like the auto-abbreviation of proceeding titles was introduced in #2245, and it works very well IMO. There have been some earlier discussions on the Zotero forums at here and here and also here, but all of them look like this won't be solved in Zotero in the short term. So I guess the current hacking approach is to treat the proceedings title the same as the journal title, right? But there are some special punctuations used in some proceeding titles that I thought should be preserved, like the following:

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

The current BBT output is:

2022 IEEECVF Conf. Comput. Vis. Pattern Recognit. CVPR

What I expected is (after preserving the special punctuation):

2022 IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR)

Which is also confirmed by page 7 of the IEEE Reference Guide. I know there may be some manually hackable ways to do this, but I prefer not to.

retorquere commented 1 year ago

You'll have to ask Zotero -- I use their Zotero.Cite.getAbbreviation to generate journal abbreviations, and I don't really know how it works. I know Juris-M used to allow specifying the style to be used for abbreviations, but I don't think Zotero has ever supported this. There is a list of overrides that the function consults (abbreviations.json) but it seems to be bundled inside Zotero, not easy to modify the list.

Do postscripts count as manually hacking?

aywi commented 1 year ago

You'll have to ask Zotero -- I use their Zotero.Cite.getAbbreviation to generate journal abbreviations, and I don't really know how it works. I know Juris-M used to allow specifying the style to be used for abbreviations, but I don't think Zotero has ever supported this. There is a list of overrides that the function consults (abbreviations.json) but it seems to be bundled inside Zotero, not easy to modify the list.

Do postscripts count as manually hacking?

No, I'm referring to manually hardcoding something into Zotero fields... Postscripts will be great, too!

retorquere commented 1 year ago

You can change the output in pretty much any way in a postscript, but I don't know a good generalized way to generate journal abbreviations from full journal titles.

aywi commented 1 year ago

Because I still want to use the current pipeline provided by Zotero, I try this hack first:

if (Translator.BetterBibTeX && zotero.itemType === 'conferencePaper') {
    tex.add({ name: 'booktitle', value: tex.has.booktitle.value
        .replace('IEEECVF', 'IEEE/CVF')
        .replace(' CVPR', ' (CVPR)')
    });
}

And then I end up with this generized version with the help of ChatGPT:

if (Translator.BetterBibTeX && zotero.itemType === 'conferencePaper') {
    const btFull = zotero.publicationTitle;
    const btAbbr = tex.has.booktitle.value;
    function recover(full, abbr, regex) {
        const matches = full.match(regex);
        if (matches) {
            matches.forEach(match => {
                const word = match.replace(/[\/()]/g, '');
                abbr = abbr.replace(word, match);
            });
        }
        return abbr;
    }
    const sRegex = /\b(\w+\/\w+)\b/g;
    const pRegex = /\((\w+)\)/g;
    tex.add({ name: 'booktitle', value: recover(btFull, recover(btFull, btAbbr, sRegex), pRegex) });
}

PS: I have tried many times to find this zotero.publicationTitle (why not zotero.proceedingsTitle?).

Questions:

  1. How can I access the current option of abbreviation in postscripts? I want to run scripts only when this option is enabled.
  2. How to keep the original order of fields in BibTeX after calling tex.add()? This is a minor issue, but I am curious anyway.
retorquere commented 1 year ago

[generated code]

I'm not entirely sure what this code intends to do, but if it works for you, that's good obviously.

PS: I have tried many times to find this zotero.publicationTitle (why not zotero.proceedingsTitle?).

I can explain the reasoning behind it if you want but if you just want to know what fields are available in the postscript, export the entry as BetterBibTeX JSON with the Normalize option turned on.

How can I access the current option of abbreviation in postscripts? I want to run scripts only when this option is enabled.

Translator.options.useJournalAbbreviation will be true when the option is on.

How to keep the original order of fields in BibTeX after calling tex.add()? This is a minor issue, but I am curious anyway.

this should do it:

const order = Object.keys(tex.has)

// do some stuff here

for (const name of order) {
  const field = this.has[name]
  if (field) {
    delete this.has[name]
    this.has[name] = field
  }
}
aywi commented 1 year ago

[generated code]

I'm not entirely sure what this code intends to do, but if it works for you, that's good obviously.

PS: I have tried many times to find this zotero.publicationTitle (why not zotero.proceedingsTitle?).

I can explain the reasoning behind it if you want but if you just want to know what fields are available in the postscript, export the entry as BetterBibTeX JSON with the Normalize option turned on.

How can I access the current option of abbreviation in postscripts? I want to run scripts only when this option is enabled.

Translator.options.useJournalAbbreviation will be true when the option is on.

How to keep the original order of fields in BibTeX after calling tex.add()? This is a minor issue, but I am curious anyway.

this should do it:

const order = Object.keys(tex.has)

// do some stuff here

for (const name of order) {
  const field = this.has[name]
  if (field) {
    delete this.has[name]
    this.has[name] = field
  }
}

Yeah, I deleted all the comments and forgot to bring them here. The following is one of the prompts to create the code (there are other prompts to improve, but this is the core idea):

1. There are two strings `stringA` and `stringB`;
2. Find all the words in `stringA` that have `/` in them, and in `stringB` there may be the same word(s) but without `/`: try to recover the original `/` in `stringB`;
3. Find all words in `stringA` which are surrounded by `(` and `)`, and in `stringB` there may be the same word(s) but without `(` and `)`: try to recover the original `(` and `)` in `stringB`;
4. Combine the above functions and make a simple and concise version.

Thanks a lot for this! The final version of postscripts to close this issue without the changes from Zotero's side:

if (Translator.BetterBibTeX) {
    const order = Object.keys(tex.has);
    if (zotero.itemType === 'conferencePaper' && Translator.options.useJournalAbbreviation) {
        function recover(full, abbr, regex) {
            const matches = full.match(regex);
            if (matches) {
                matches.forEach(match => {
                    const word = match.replace(/[\/()]/g, '');
                    abbr = abbr.replace(word, match);
                });
            }
            return abbr;
        }
        const btFull = zotero.publicationTitle;
        const btAbbr = tex.has.booktitle.value;
        const sRegex = /\b(\w+\/\w+)\b/g;
        const pRegex = /\((\w+)\)/g;
        tex.add({ name: 'booktitle', value: recover(btFull, recover(btFull, btAbbr, sRegex), pRegex) });
    }

    // do some other stuff

    for (const name of order) {
        const field = this.has[name];
        if (field) {
            delete this.has[name];
            this.has[name] = field;
        }
    }
}