retorquere / zotero-better-bibtex

Make Zotero effective for us LaTeX holdouts
https://retorque.re/zotero-better-bibtex/
MIT License
4.99k stars 277 forks source link

export langid as language #2909

Open ryofurue opened 1 week ago

ryofurue commented 1 week ago

Debug log ID

SIFINYEJ-apse/6.7.206-6

What happened?

My Zotero database has a lot of instances of language="EN". The Zotero webbrowser plugin inserts them.

Then, I need to translate Zotero's language to bibtex's language so that biblatex prints the name of the language if it's not English; for example, I need biblatex to print "in Chinese" in the reference list. So, I select the BBT's switch to translate language to language.

But, currently, BBT translates language="EN" to language={EN}, which doesn't work. It needs to be language={english}.

So, does it make sense to translate "EN" to "english" if the target is the language field?

retorquere commented 1 week ago

I need a debug log generated by right-clicking the item of interest and selecting "Better bibtex" - "send debug log". The log ID will have -refs- in it.

ryofurue commented 1 week ago

PJRN83N8-refs-apse/6.7.206-6

retorquere commented 1 week ago

I think https://github.com/retorquere/zotero-better-bibtex/issues/1926#issuecomment-921579113 and the comments below apply here.

ryofurue commented 1 week ago

I think https://github.com/retorquere/zotero-better-bibtex/issues/1926#issuecomment-921579113 and the comments below apply here.

But the metadata information "language" which publishers provides does mean the language of the content. French, Russian, German, Chinese, Japanese, (perhaps Korean) publishers do this.

Then what solution would you provide?

Currently, the tool chain of Zotero browser plugin -> Zotero -> BBT is broken because it cannot handle the publishers' metadata which is useful to end users like me.

Without providing a solution, you argue that translating Zotero's language to biblatex's language is "wrong". But, then what does that "wrong" mean?

Zotero is a tool to help users. I say, provide a workaround, even if it's "wrong" in your mind.

github-actions[bot] commented 1 week ago

:robot: this is your friendly neighborhood build bot announcing test build 6.7.206.2909.6452 ("add tex.langid()")

This update may name other issues, but the build just dropped here is for you; it just means problems already fixed in other issues have been folded into the work we are doing here. Install in Zotero by downloading test build 6.7.206.2909.6452, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".

retorquere commented 1 week ago

But the metadata information "language" which publishers provides does mean the language of the content. French, Russian, German, Chinese, Japanese, (perhaps Korean) publishers do this.

If by this you mean that Zotero picks the language field up from French, Russian, German, Chinese, Japanese, (perhaps Korean) publishers sites -- what can I say? Zotero puts it in the language field, and the language field has the aforementioned meaning in Zotero. Zotero's automated data scraping always requires inspection, the results are invariably imperfect.

Then what solution would you provide?

A postscript, as mentioned in that the linked issue.

The build that just dropped makes it easier to do what you want in a postscript. In the current release, this will work:

if (Translator.BetterTeX) {
  if (tex.has.langid) {
    delete tex.has.language
    tex.has.langid.name = 'language'
  }
}

but with build 6452 you can do

if (Translator.BetterTeX) {
  tex.add({ name: 'language', value: tex.langid() })
}

if that works to your satisfaction, I'll roll it into a new release.

ryofurue commented 1 week ago

First of all, I thank you for the help. I'll try the solution you provide above.

So, the following discussion is just for clarification.

If by this you mean that Zotero picks the language field up from French, Russian, German, Chinese, Japanese, (perhaps Korean) publishers sites -- what can I say? Zotero puts it in the language field, and the language field has the aforementioned meaning in Zotero. Zotero's automated data scraping always requires inspection, the results are invariably imperfect.

I think you are speaking as BBT developer. I'm not particularly addressing to you as BBT developer. The designers of the tool chain web-plugin -> Zotero -> BBT as a whole have missed the possibility of publishers providing language-content information.

That is, somebody decided that Zotero's language does not mean the content language but they did not provide a field indicating content language.

In the future, I hope that

  1. Zotero include an additional field to indicate content language.

  2. the web-plugin automatically detect whether the publisher's "language" is content-language or not. If there is ambiguity, a user switch or intervention would be needed at the web-plugin.

What I'm saying is that, until these changes are implemented, Zotero's language field needs to be abused to indicate content language by the user's choice. That is a workaround.

Without providing a solution, you argue that translating Zotero's language to biblatex's language is "wrong". But, then what does that "wrong" mean?

You are introducing that word here, not me.

I'm sorry for the misunderstanding. I shouldn't have used the quotation mark on the word "wrong". What I mean is this: You quoted the past discussion. It is clear from the discussion that the participants to the discussion decided that it is wrong to translate Zotero's language to biblatex' languguage.

Then, you quoted the discussion. So, I thought you agree with the conclusion of the discussion: Zotero's language should not be translated to biblatex' languguage.

Then, as you can see, my question is why not use (or abuse) the translation as a workaround? Whether it's wrong or not doesn't matter as long as it works for the users who choose to use the workaround.

retorquere commented 1 week ago

I think you are speaking as BBT developer. I'm not particularly addressing to you as BBT developer. The designers of the tool chain web-plugin -> Zotero -> BBT as a whole have missed the possibility of publishers providing language-content information.

For starters, I have worded myself here poorly from the start. I will sometimes argue a general case in a way that sounds as if I mean it to apply by necessity to a particular case, because I am thinking of how it affects all users potentially. I think that has happened here. Not my intention. That said, I do speak as a BBT developer, because choices I make affect everyone, and the possibility is not missing, it's just called "postscript" in this case.

That is, somebody decided that Zotero's language does not mean the content language but they did not provide a field indicating content language.

Yes. Zotero did. It is a design desideratum for BBT to export bib(la)tex by default that would render as closely as possible to what a Zotero-generated bibliography would look like, so I take what the fields "mean" to Zotero/CSL as their intended meaning. I have to assign some meaning to the fields, or BBT would not be able to generate meaningful output. For some cases where people want to deviate from this (and whether these reasons are "wrong" or not is not really a major concern to me), there are sometimes preferences, and other times postscripts.

When BBT started I would pretty much add a new preference for any user request, and that quickly grew out of hand. So now I roughly add a preference either when the desired effect is hard to attain, and it's not likely to be a foot-gun. I had forgotten that the result of the discussion is summarized here; given that, and that there is a very simple postscript (I've just added them to that page), I don't really see the need for a new preference.

1. Zotero include an additional field to indicate content language.

And when Zotero does, I'll export it to the language field. Juris-M might actually already offer it; it is much more sensitive to multi-language publishing than Zotero. I have seen mention that development has picked up again after an hiatus to bring it up to par with Zotero 7. Juris-M 6 has unfortunately fallen so far behind Zotero that I can't currently support it.

2. the web-plugin automatically detect whether the publisher's "language" is content-language or not.  If there is ambiguity, a user switch or intervention would be needed at the web-plugin.

That would still require a place to store this knowledge in Zotero, so this presupposes point 1. Zotero does not retain information on the choices a scraper made when it imported an item. If that knowledge isn't/can't be stored on the item., it is lost forever.

What I'm saying is that, until these changes are implemented, Zotero's language field needs to be abused to indicate content language by the user's choice. That is a workaround.

But to be specific here, you need it to be abused, not everyone. For user-specific field-abuse workarounds, there are postscripts, and I have provided the postscript.

I'm sorry for the misunderstanding. I shouldn't have used the quotation mark on the word "wrong". What I mean is this: You quoted the past discussion. It is clear from the discussion that the participants to the discussion decided that it is wrong to translate Zotero's language to biblatex' languguage.

It is not universally beneficial is what I take from that conversation.

Then, you quoted the discussion. So, I thought you agree with the conclusion of the discussion: Zotero's language should not be translated to biblatex' languguage.

I agree that it should not by default, that is correct.

Then, as you can see, my question is why not use (or abuse) the translation as a workaround? Whether it's wrong or not doesn't matter as long as it works for the users who choose to use the workaround.

And I've offered you a supported way to get that workaround. That's what postscripts are for. I don't need any BBT user to be a javascript developer either -- I will gladly create and tweak postscripts on demand.

The tex.langid() has rolled out to a new release, so the first postscript above is now redundant (although it will of course still work).

retorquere commented 1 week ago

Will this work for your use-case?