[Request]: Recommendation for preprints

plk / biblatex

biblatex is a sophisticated bibliography system for LaTeX users. It has considerably more features than traditional bibtex and supports UTF-8

517 stars 118 forks source link

[Request]: Recommendation for preprints #1267

Open ryofurue opened 1 year ago

ryofurue commented 1 year ago

As the OP of another thread in this forum says, preprints are becoming more and more common. There are well-supported public preprint servers in addition to arXiv. Then, what "type" should we use for preprints?

So, it would be nice if the official Biblatex manual included a recommendation.

If you look at TeX Stackexchange, you'll see that people tend to use @article but some people use @online or @unpublished.

Here, there are a lot of ambiguities. For example, where should the name of the archive go in the @online type? organization ?

If you use @article, where should the version number go?

Should we include the word "preprint" in the note field?

If there is a recommendation in the official Biblatex manual, other tools will follow it. For example, what "type" of Biblatex should Zotero's "Preprint" type correspond to when generating the .bib file?

plk commented 1 year ago

My general opinion here is that @article should be used for everything like this - the APA 7th edition is going this way since @online is becoming more useless since everything is online these days and almost every @article has a URL. I also prefer "unpublished" as a state rather than as separate type.

ryofurue commented 1 year ago

"unpublished" as a state rather than as separate type

Hmm . . . perhaps this is just the semantics of the word "published" but I feel that "unpublished" would contradict the fact that a preprint is a "form of publication". It hasn't gotten through the peer-review process but that's pretty much only the most important difference from a regular peer-reviewed article.

A preprint is published in all intents and purposes: It's gotten a DOI and the public can (usually freely) read it. Reputable preprint services promise to keep those article available to the public in the foreseeable future. Some preprint-hosting services even have minimal quality control by editors.

On the other hand, "unpublished" implies that the article isn't publicly available: you would have to locate the author and ask her/him for a copy to read it.

So, a distinction between "unpublished" and "preprint" would be useful to make in the reference list.

I don't have particular preferences between @article and @online (and anything else) as long as the generated reference list clearly indicates the fact that it is a preprint and as long as there is a clearly written "standard" or "convention" or "recommendation" as to how to present various pieces of metadata.

For that matter, a preprint entry would need a standardized way to indicate the version. Unlike books, "the same" preprint can have multiple versions, all of which are publicly available under the same DOI.

plk commented 1 year ago

I like the APA route of using "HOWPUBLISHED" to indicate things like preprints etc and everything is then an @article (unless it obviously isn't ...) with a URL.

moewew commented 1 year ago

I agree that @online has become a bit of a useless type, what with basically everything being online nowadays.

But I disagree that @article is a good type for things that aren't published in a journal. For one journaltitlte is a required field for @article, meaning that styles can expect it to be present. Very early preprints will usually not have a journal yet (though I once got into a discussion whether or not arXiv should count as a journal). This can lead to odd output with styles that expect a journal (e.g. a lonely "in:"). Secondly, biblatex-examples.bib uses @online for preprints without a journal.

https://github.com/plk/biblatex/blob/4c31c6b9bac61a3edfeb763dc8cb4012c56edf5d/bibtex/bib/biblatex/biblatex-examples.bib#L1386-L1399

https://github.com/plk/biblatex/blob/4c31c6b9bac61a3edfeb763dc8cb4012c56edf5d/bibtex/bib/biblatex/biblatex-examples.bib#L1420-L1439

https://github.com/plk/biblatex/blob/4c31c6b9bac61a3edfeb763dc8cb4012c56edf5d/bibtex/bib/biblatex/biblatex-examples.bib#L1457-L1479

ryofurue commented 1 year ago

For one journaltitlte is a required field for @article, meaning that styles can expect it to be present. Very early preprints will usually not have a journal yet

I don't think that the name of the peer-reviewed journal that the manuscript would eventually be published is relevant to the present discussion. If you post your preprint to a preprinting service, it will remain there for ever, whether the manuscript is eventually published in a peer-reviewed journal or not. We are discussing the way to refer to the preprint, not the peer-reviewed journal which the preprint is ultimately published.

(though I once got into a discussion whether or not arXiv should count as a journal).

I'm afraid I don't see what the practical problem is, if you put the name of the preprinting service in the journaltitle field.

Whether arXiv should count as a journal or not, is not the question we are asking here. The question is how to record and present the name of the preprinting service. Printing the name "arXiv" in the place of journal is a practically fine way. If the fact that it is a preprint is somehow indicated (for example with howpublished="preprint"), there is no room for misunderstanding.

On the other hand, I don't see any practical problems with @online , either, if the meaning of eprinttype is clearly defined. The problem I found in the biblatex-examples.bib examples is that they appear to have only arXiv in mind. There are other preprinting services and they are becoming gradually popular. Do their names count as eprinttypes? If the "type" is the "name", eprinttype should be "arXiv" with a capital "X".

Or, instead of redefining eprinttype, we should perhaps introduce a new field to indicate the name of the preprinting service.

plk commented 1 year ago

I'm keen on the idea of using n2t's meta-resolving service which seems to be able to resolve most "eprint" services (see #1183 ) and I would agree that @article is a generally good type. I wonder if, for example, we change journaltitle to publicationtitle and add a default mapping for backwards compat. We already do this for journal->journaltitle.

moewew commented 1 year ago

I don't think that the name of the peer-reviewed journal that the manuscript would eventually be published is relevant to the present discussion. [...] We are discussing the way to refer to the preprint, not the peer-reviewed journal which the preprint is ultimately published. [...] I'm afraid I don't see what the practical problem is, if you put the name of the preprinting service in the journaltitle field. [...] Whether arXiv should count as a journal or not, is not the question we are asking here. The question is how to record and present the name of the preprinting service. Printing the name "arXiv" in the place of journal is a practically fine way. If the fact that it is a preprint is somehow indicated (for example with howpublished="preprint"), there is no room for misunderstanding.

Alright. I'm probably missing something. Anyway, here is an attempt to clarify my train of thought, which I think is at least tangentially relevant to the issue at hand.

One question that arose here was which entry type would be appropriate for preprint papers. @article and @online are usually among the suggested options.

I don't think @article is a good choice for papers that are solely available as preprints, i.e. papers that haven't been published in a journal (yet). That is because @article assumes that there is a journaltitle, which you can't usefully give if the work hasn't been published in a journal. The only way I see to work around this missing journaltitle is to put the eprint service in the journaltitle field, but I believe that this is not semantically sound (cf. how biblatex-examples.bib does not use arXiv and friends as journaltitle).

On the other hand, I don't see any practical problems with @online , either, if the meaning of eprinttype is clearly defined. The problem I found in the biblatex-examples.bib examples is that they appear to have only arXiv in mind. There are other preprinting services and they are becoming gradually popular. Do their names count as eprinttypes? If the "type" is the "name", eprinttype should be "arXiv" with a capital "X".

Well, the biblatex documentation defines eprinttype as follows

The type of eprint identifier, e.g., the name of the archive, repository, service, or system the eprint field refers to.

So to answer your question: Yes, the name of other preprint services also counts as eprinttype.

biblatex-examples.bib indeed only has examples with arxiv and googlebooks, but a number of other eprinttypes are pre-defined (so that you don't have to worry about capitalisation for those and that the numbers are nicely hyperlinked). The predefined generic field format will suffice for those who worry about the capitalisation manually and don't need the linkage.

https://github.com/plk/biblatex/blob/4c31c6b9bac61a3edfeb763dc8cb4012c56edf5d/tex/latex/biblatex/biblatex.def#L509-L555

Example for bioRxiv and JSTOR

\documentclass[british]{article}
\usepackage[T1]{fontenc}
\usepackage{babel}
\usepackage{csquotes}

\usepackage[backend=biber, style=authoryear]{biblatex}
\usepackage{hyperref}

\begin{filecontents}{\jobname.bib}
@book{elk,
  author     = {Anne Elk},
  title      = {A Theory on Brontosauruses},
  year       = {1972},
  publisher  = {Monthy \& Co.},
  location   = {London},
  eprinttype = {jstor},
  eprint     = {12345678},
}
@online{puffin,
  author     = {Oliver Kersten and Bastiaan Star and Deborah M. Leigh
                and Tycho Anker-Nilssen and  Hallvard Strøm
                and Jóhannis Danielsen and Sébastien Descamps
                and Kjell E. Erikstad and Michelle G. Fitzsimmons
                and Jérôme Fort and Erpur S. Hansen and Mike P. Harris
                and Martin Irestedt and Oddmund Kleven and Mark L. Mallory
                and Kjetill S. Jakobsen and Sanne Boessenkool},
  title      = {Complex Population Structure
                of the {Atlantic} Puffin Revealed by Whole Genome Analyses},
  date       = {2020-11-07},
  eprinttype = {bioRxiv},
  eprint     = {2020.11.05.351874},
  doi        = {10.1101/2020.11.05.351874},
}
\end{filecontents}
\addbibresource{\jobname.bib}
\addbibresource{biblatex-examples.bib}

\begin{document}
Lorem \autocite{sigfridsson,elk,puffin}

\printbibliography
\end{document}

However, I feel that the idea of eprinttype+eprint is to give a way of obtaining the work in question, not to explicitly classify it as a preprint.

ryofurue commented 1 year ago

The only way I see to work around this missing journaltitle is to put the eprint service in the journaltitle field, but I believe that this is not semantically sound

You've just said it! It's just semantics. What I'm saying is, forget about semantics. :-) Suppose we declare that "the name of the preprinting service shall be put in journaltitle." Then, this solution will just work! Do you see any practical problems with this solution? That's my point.

On the other hand, your proposal includes a semantic problem, if semantics is important to you. You propose that the name of the preprinting service should be put in eprinttype, which isn't semantically sound. There can be multiple preprint services which are of the "arxiv" eprinttype. This field represents a "type" and so there can be many instances belonging to the same type.

But, as I said, forget about semantics :-)

Now, I have a question about your proposal:

a number of other eprinttypes are pre-defined (so that you don't have to worry about capitalisation for those and that the numbers are nicely hyperlinked)

Does that mean that each time you find a new preprinting service, you have to send the information to the authors of biblatex to register it as an eprinttype?

As an example, let's say you want to cite a preprint on https://essopenarchive.org . Its eprinttype may be "essopenarchive" and its name is "ESS Open Archive" . . . Or is arXiv a special case in your scheme?

However, I feel that the idea of eprinttype+eprint is to give a way of obtaining the work in question, not to explicitly classify it as a preprint.

Yes, we still need a method or convention to indicate that the particular bib entry is a preprint because this information needs to be printed in the reference list.

moewew commented 1 year ago

Thing is: Semantics are not just idle word play here. If we want to develop official guidance, we need to take into account the meaning of fields (as documented) because users and style developers depend on them.

It's absolutely fine to ignore semantics on a case-by-case basis if you like the result, but for documented best practice recommendations we need to make sure that styles will (still have the chance to) produce sensible output. What if you are OK with getting essentially analogous output for eprints and articles published in a journal (with the eprint service standing in for the journal), but there is a style out there that distinguishes these cases? After all, we don't just put the fully formatted bibliography entry into the note field of an entry.

You propose that the name of the preprinting service should be put in eprinttype, which isn't semantically sound. There can be multiple preprint services which are of the "arxiv" eprinttype. This field represents a "type" and so there can be many instances belonging to the same type.

I wouldn't get too hung up on the word type here (which can mean almost anything to almost nothing, depending on the "level" we operate). Maybe type isn't the most intuitive name for the job (mind you: I do think it works and I am painfully aware of the difficulty of choosing good names for things, especially if they are user-facing).

If you will, think of eprint as eprintid and of eprinttype as eprintservice: eprint denotes the specific ID/identifier of the preprint on the eprinttype (pr)eprint service. I think the "e.g." of the field documentation that I quoted above captures the meaning more clearly

The type of eprint identifier, e.g., the name of the archive, repository, service, or system the eprint field refers to.

I don't quite understand what you mean by preprint services of "arxiv" eprinttype, but I firmly believe that arXiv and say bioRxiv and PsyArXiv are three different eprinttypes in the sense of the biblatex manual, because they refer to three distinct services. (Even if the three services are similar to some degree and thus conceivably belong to the same type at a certain level of magnification.)

Does that mean that each time you find a new preprinting service, you have to send the information to the authors of biblatex to register it as an eprinttype?

Yes and no. First of all, this is really what I would like to avoid. There are countless possible eprint services out there and I don't want us to have to maintain a list of them and in a way be a gatekeeper as to which service is "good enough" to be included.

biblatex has a generic eprint field format

https://github.com/plk/biblatex/blob/4c31c6b9bac61a3edfeb763dc8cb4012c56edf5d/tex/latex/biblatex/biblatex.def#L509-L519

this is used if no eprinttype-specific definition is given. This field format essentially produces the output

<eprinttype>: <eprint>

see e.g. the puffin entry from bioRxiv I showed above.

However, this will only print the eprint service name and the ID. If you want there to be a link or you don't want to worry about capitalisation of the eprinttype (as with arXiv vs arxiv), additional definitions are necessary. biblatex.def has definitions for HDL, arXiv, JSTOR, PubMed, Google Books

https://github.com/plk/biblatex/blob/4c31c6b9bac61a3edfeb763dc8cb4012c56edf5d/tex/latex/biblatex/biblatex.def#L520-L556

Do note that at least in the standard styles, you cannot have an eprintclass without an eprint field. So in this scheme there is no way to give a eprint service without a suitable eprint ID. (Even if no useful/sensible ID is available, because DOIs are assigned.)

ryofurue commented 1 year ago

there is no way to give a eprint service without a suitable eprint ID.

So, after all, what is your proposal to modify biblatex for preprints, after all? What bib entry should we write for a preprint and how should the fields be handled by biblatex? The requirements include

The fact that it's a preprint be printed in the reference list.
The name of the preprint service be printed.

One proposal would be to extend the meaning of the journaltitle of @article to include preprints (see below), put howpublished="preprint", and modify biblatex so that the howpublished field will be acted on under @article.

You propose to use @online and . . . and what modification would you require on the part of biblatex?

we need to take into account the meaning of fields (as documented)

I think I see what you mean. Then, what about changing the meaning of the journaltitle field? What about changing the documentation of the journaltitle field in such a way that it explicitly includes the name of eprint and preprint services? because, according to you, the meaning is defined by the documentation.

Then, (if I understand your argument correctly), you will agree that @article is also a good candidate for preprints.

I wouldn't get too hung up on the word type here

Initially I thought you get hung up on the word journal and that was the reason for your objection to using journaltitle field for the name of the preprint service. I said forget about semantics because I thought you were referring to the meaning of the word "journal" as defined (recorded) in English dictionaries. But, if your meaning is what the documentation defines, then changing the documentation will resolve your objection.

If you will, think . . . of eprinttype as eprintservice

If you will, think of journaltitle as journallikearchive :-)

plk commented 1 year ago

Since we have default driver-level maps distributed with biblatex, we should use them. I agree with @moewew that the semantics derived from typical interpretation of field names matter to the general user and so we can just internally move to publicationtitle or something for all default styles and them force a duplication into journattitle for backwards compat. Since no external styles will use publicationtitle currently, it will be ignored and people can move to the new field whenever they want. We need some mechanism to compatibly move to better field names over time.

moewew commented 1 year ago

So, after all, what is your proposal to modify biblatex for preprints, after all?

At the moment, I don't know. I just wanted to clear up some issues with the semantics before they got lost in the enthusiasm for change. I also wanted to explain the status quo. If it turns out the status quo is not good enough, we need to think about a new solution. But that has to be done carefully - we need to take into account what changes would mean for users and style developers.

For now I'm a bit sceptical about just repurposing journaltitle. For one I feel that styles may want to be able to distinguish between journals and preprint services. Using the same field makes that harder. Additionally, I know that even small changes to drivers or bibmacros can break people's code (because they patch macros or assume certain things that are no longer true). I have elsewhere promised to be careful about gratuitous changes and intend to be fairly conservative until convinced that change is indeed necessary and useful.

I think I see what you mean. Then, what about changing the meaning of the journaltitle field? What about changing the documentation of the journaltitle field in such a way that it explicitly includes the name of eprint and preprint services? because, according to you, the meaning is defined by the documentation.

Let me just comment in broad strokes, since I haven't completely made up my mind about the specifics just yet.

Yes, I believe that the main (and possibly final) arbiter for field meanings is the documentation. But I also believe that we need to take into account the "realities on the ground", that is to say how people actually use fields and types (generally, people should be able to expect that fields mean roughly what their names suggests). Of course the latter is much more nebulous.

As far as changes to the documentation go.

Changing the meaning of fields in the documentation in a way that is backwards incompatible is a big no. Especially with commonly used fields and types.
In principle, backwards compatible change is acceptable. But I would argue that not every change that is strictly backwards compatible on a technical level is a good idea. The change should make sense in context of other field/type meanings as well. It should work within not just the bare wording of the documentation, but also the "spirit" of the current data model. (Again, this is a bit vague.)

ryofurue commented 1 year ago

@moewew Thank you for your detailed explanation! I think I finally understand your position. I think I agree all that you said in your previous message.

So,

people should be able to expect that fields mean roughly what their names suggests

so, you will agree with my objection to the use of eprinttype to put the name of the preprint service. I expect that the "eprinttype" means the type of the eprint service. Many people will agree with me about that. Because it's a type, I expect multiple eprint services belong to a single type. Say, archives AAA, CCC, and ZZZ are all of type "arxiv", so

@online{ . . .
archivename="AAA",
eprinttype="arxiv",
. . . 
}
@online{ . . . 
archivename="CCC",
eprinttype="arxiv",
. . .
}

would be what I expect from the name eprinttype.

I agree with you that "people should be able to expect that fields mean roughly what their names suggests". Therefore you and I agree that eprinttype is not a good field to put the name of the archive into.

pauloney commented 1 year ago

Classifying a work deposited in the arXiv as an @article is the WORST possible solution to any of the problems that have been brought up on this thread.

You can put lipstick on a pig, but it will always BE a pig. The correct way to classify a pre-print that has not been published yet is

 @unpublished

with a field that can point to the arXiv number and link. The reasons are many:

A published work is something that has gone peer-review and approved as such -- the arXiv is not that -- is simply a repository.

A published work is acquired by libraries and thousands of different copies of it are made available -- disaster will most certainly not strike all of them at once. Pre-print repositories exist during a period of time, and most that have existed in the past are defunct now.

Another huge difference is that serious publishers do serious editing on works and even if the material is available at the arXiv, it can be extremely different from the published version. The difference between the two types of work should be clear and not muddled.

There are many types of works being deposited at the arXiv, articles, books,... to name just a few.

There are many people that sort their references by "type" and they will end up with arXiv material in the same list as published articles.

We should name it for what it is, Paulo Ney

On Tue, Mar 14, 2023 at 12:34 PM ryofurue @.***> wrote:

@moewew https://github.com/moewew Thank you for your detailed explanation! I think I finally understand your position. I think I agree all that you said in your previous message.

So,

people should be able to expect that fields mean roughly what their names suggests

so, you will agree with my objection to the use of eprinttype to put the name of the preprint service. I expect that the "eprinttype" means the type of the eprint service. Many people will agree with me about that. Because it's a type, I expect multiple eprint services belong to a single type. Say, archives AAA, CCC, and ZZZ are all of type "arxiv", so

@online{ . . . archivename="AAA", eprinttype="arxiv", . . . } @online{ . . . archivename="CCC", eprinttype="arxiv", . . . }

would be what I expect from the name eprinttype.

I agree with you that "people should be able to expect that fields mean roughly what their names suggests". Therefore you and I agree that eprinttype is not a good field to put the name of the archive into.

— Reply to this email directly, view it on GitHub https://github.com/plk/biblatex/issues/1267#issuecomment-1468709275, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR7WYV5HIRQTOYF3SC5EOLW4DB3BANCNFSM6AAAAAAUOOMLYQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ryofurue commented 1 year ago

We should name it for what it is,

To you, the name is important. I agree that the name influences people how they treat the entry. For example,

There are many people that sort their references by "type"

I can imagine that. Then, let me follow that line of argument.

The name "unpublished" implies (to me) that you have to locate the author and ask for a copy of the article because it's not publicly available.

A preprint is "published" for all intents and purposes. True, it is not peer-reviewed, its quality control isn't as rigorous as at journal publishers. But that not make it "unpublished".

What about technical reports? Research institutions around the world "publish" technical reports. In some cases, it's not peer-reviewed and its quality varies.

On the other hand, at some preprint services, editors enforce minimum quality control. Because there is an online discussion forum on each article, the authors wouldn't post their preprint if they aren't confident of their work and writing.

So, I argue that "posting a preprint to a reputable preprint service" is a relatively new form of "publication".

So, if the name is important, if "we should name it for what it is", we should create a new type @preprint, shouldn't we?

Actually, that's what Zotero the bibliography manager does.

I started this thread because Zotero developers aren't able to find a good solution when they map Zotero's preprint type to a biblatex type.

pauloney commented 1 year ago

It is a pity that the "pub" in "publishing" and "public available" are the same, but in science the meaning is very clear:

The act of "publishing" ecompasses 3 very well defined steps:

Go through peer-review
Go through editorial for uniform presentation
Be collected by libraries for permanent availability

The arXiv, or any other preprint server, do not comply with any of the three -- so it is NOT a publication.

Now, material deposited at preprint repositories have always been public. 25 years ago you would make your request to a secretary or an author, these days you make it to a web-server. There are no differences in the processes, all material there has been publicly available.

So .... being publicly available does NOT mean it has been published.

I agree with you that @unpublished is a misnomer, because it can be easily taken for "not public" which is not the case.

I also agree with you that @preprint could be a good solution, specially in the view of Zotero, and because unfortunately there are:

preprint depositories that are not online and never will
unpublished material that exists but never will be online
unpublished material that do not even exist and has to be quoted

and for those the dichotomy of @@.*** is not always enough.

My initial post is to make clear that I don't think calling an arXiv deposith a publication would be a reasonable solution for the problem.

On Tue, Mar 14, 2023 at 9:41 PM ryofurue @.***> wrote:

We should name it for what it is,

To you, the name is important. I agree that the name influences people how they treat the entry. For example,

There are many people that sort their references by "type"

I can imagine that. Then, let me follow that line of argument.

The name "unpublished" implies (to me) that you have to locate the author and ask for a copy of the article because it's not publicly available.

A preprint is "published" for all intents and purposes. True, it is not peer-reviewed, its quality control isn't as rigorous as at journal publishers. But that not make it "unpublished".

What about technical reports? Research institutions around the world "publish" technical reports. In some cases, it's not peer-reviewed and its quality varies.

On the other hand, at some preprint services, editors enforce minimum quality control. Because there is an online discussion forum on each article, the authors wouldn't post their preprint if they aren't confident of their work and writing.

So, I argue that "posting a preprint to a reputable preprint service" is a relatively new form of "publication".

So, if the name is important, if "we should name it for what it is", we should create a new type @preprint, shouldn't we?

Actually, that's what Zotero the bibliography manager does.

I started this thread because Zotero developers aren't able to find a good solution when they map Zotero's preprint type to a biblatex type.

— Reply to this email directly, view it on GitHub https://github.com/plk/biblatex/issues/1267#issuecomment-1469319144, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR7WYWWVEJETE5TW7KMOILW4FB6RANCNFSM6AAAAAAUOOMLYQ . You are receiving this because you commented.Message ID: @.***>

moewew commented 1 year ago

@ryofurue I appreciate the maieutics, but please let me reiterate that "generally, people should be able to expect that fields mean roughly what their names suggests" is quite a bit less strict than "all fields must have the meaning of the dictionary definition of their name" or "as soon as a couple of people think a name does not make sense, we need to change it" and that the documented field semantics from the manual are also relevant. I'm quite happy to give the documentation more leeway for slightly unnatural constructs like eprinttype as opposed to obvious stuff like journaltitle.

Anyway, I guess this is largely irrelevant. If we need to find a new way to deal with this with new fields, we can try and find more intuitive names than eprinttype (though I would not be surprised if for every name we can come up with we can find at least one person who finds it counter-intuitive). If we decide that the current eprinttype+eprint system is enough, we will probably stick with the old name for backwards compatibility reasons (however unfortunate it may be).

perstar commented 1 year ago

As an example, let's say you want to cite a preprint on https://essopenarchive.org . Its eprinttype may be "essopenarchive" and its name is "ESS Open Archive" . . .

I'm off a tangent here, but it might be good idea to have a define-preprint-service command in Biblatex where you state label, formatted name and how to convert its string to a URL. Then you could use use that command yourself for a new service before it is added officially.

perstar commented 1 year ago

A published work is something that has gone peer-review and approved as such -- the arXiv is not that -- is simply a repository. That was given as part of the reason why calling a pre-print article @articlewould be "the WORST possible solution". Is this because of a misunderstanding of what @article means? It doesn't have to be peer-reviewed academic writing. A feature in Scientific American or in a newspaper is also an @article. I have used @article for blog posts which has seemed natural to me, since one of them is a "self-contained unit with its own title" in a periodical (the blog). Anyway, there is certainly no prestige in being an @article, implying it is peer-reviewed.

If you want to list peer-reviewed articles together just using the type is not enough anyway.

Reading this thread I mostly see thoughts about what it's like for the reader who reads the next when it's new. Like:

On the other hand, "unpublished" implies that the article isn't publicly available: you would have to locate the author and ask her/him for a copy to read it.

If you read it some years afterwards you rather want to examine if this thing has been published by now, and you might search for publications from that author made after the publication of the citation you saw. And it's similar with a pre-print. If you are interested you would probably primarily want to find the final version instead if there was one, but if there wasn't the pre-print link might still work.

pauloney commented 1 year ago

On Wed, Mar 15, 2023 at 3:58 AM perstar @.***> wrote:

A published work is something that has gone peer-review and approved as such -- the arXiv is not that -- is simply a repository. That was given as part of the reason why calling a pre-print article @articlewould be "the WORST possible solution". Is this because of a misunderstanding of what @article means? It doesn't have to be peer-reviewed academic writing. A feature in Scientific American or in a newspaper is also an @article.

Scientific American and most worthy newspaper articles go through intense peer-review.

I have used @article for blog posts which has seemed natural to me, since

one of them is a "self-contained unit with its own title" in a periodical (the blog).

That is not a suitable classification. You should use @online instead.

Message ID: @.***>

ryofurue commented 1 year ago

If you read it some years afterwards you rather want to examine if this thing has been published by now, and you might search for publications from that author made after the publication of the citation you saw. And it's similar with a pre-print. If you are interested you would probably primarily want to find the final version instead if there was one, but if there wasn't the pre-print link might still work.

I'm not sure if I understand your point correctly (and if I don't, forgive me), but the preprint link will work.

That is the whole point of having the entry of the preprint version as a separate entry from the peer-reviewed version in your bib database.

The preprint will stay online with its own DOI, forever. So, after some years, the reader of your paper will click on the link in your reference list, go to the preprint website, and find the link to the peer-reviewed version.

plk commented 1 year ago

Just to add to the semantic argument. The goal of latex markup is to be semantic and so I tend to prefer semantic classification in entrytypes too. I really don't like @online at all as an entrytype - it's not semantically an entrytype, it's just a location where an entrytype might be. I think @article covers a lot of things with the other fields determining what sort of article it is.

ryofurue commented 1 year ago

Thank you all for the nice discussion.

Here is a kind of summary from my perspective. But before that, regarding backward compatibility . . .

Yes, biblatex has already been using @online for preprints, but we can change that without breaking backward compatibility. We could simply retain the code for @online and start to use @article, @unpublished, or @preprint for preprints. Old bib entries would continue to produce the same results whereas new entries would produce better results. We could then simply recommend the new way whereas promising that the old way wouldn't break, at least for a lot of years to come.

With that in mind, there have been four proposals:

Enhance @online.
Extend the "meaning" of @article to include preprints and introduce a new field (perhaps howpublished) to distinguish various "articles".
Introduce necessary fields to @unpublished to accommodate preprints.
Introduce a new type @preprints.

From me, I don't have any strong preferences. My weak preference would be #4 because it would be a bit more convenient (see my point (4) below) than the rest, simplest to the end user, and conceptually cleanest to everybody.

Now, the following points are inconsequential, meaning that I write them just for fun!

1) We have been dealing with two types of "meaning": (A) the meaning defined by the documentation and (B) the meaning generally accepted by the community of the users of biblatex.

To me, as long as (A) and (B) are not way too far from each other, I don't mind changing (A) by changing the documentation. It's fine to me to use @article, @online, or @unpublished for preprints. It's a matter of modifying the documentation if necessary.

So, I was surprised by you guys' strong feeling toward (B).

Of course, I agree that the closer (A) and (B) are, the better, but that consideration is, to me, secondary or tertiary (as long as they are not way too far from each other, that is). In other words, my tolerance about the distance between (A) and (B) is much larger than yours, because if you change (A), eventually (B) will catch up! For example, if we decided to use @article for preprints, people would eventually stop minding it. The distinction can still be made by the newly-introduced howpublished field.

2) Because I don't care much about semantics, this point is weak, but

as soon as a couple of people think a name does not make sense, we need to change it

is your wishful thinking, I think. You imply it's only me who are confused by the name eprinttype, but I predict that the following conversation will take place from time to time:

Person X: So, which field should we use for the name of the preprint service?

Person Y: Let's look at this example . . . , is this it, eprinttype="arxiv"?

X: But it's the type of the archive. What type does our preprint service belong to? Is it "arxiv"? Does it matter? Where is the field for the name anyway?

Y: Don't know. Let's look at the manual . . . It says we should use eprinttype for the name!

X: Oh. Okay.

:-) I think you probably underestimate the badness of eprinttype and overestimate the badness of journaltitle for the preprint-service name.

But see below.

3) The whole argument about semantics is becoming less and less relevant to the end users today. Because,

a) Open the webpage of the preprint you want to cite.

b) Click on the Zotero button on the browser.

c) The bibliographic information is imported into the Zotero application.

d) The bib file is automatically updated.

e) In your LaTeX text, you insert the reference by using the automatically generated cite-key (whose generation scheme is customized by you on Zotero).

See? the bibliographic types or the name of the fields don't enter the workflow of the Zotero user. Instead, what is important is the fact that the reference is a preprint, the name and URL of the preprint service, etc.

The bib types and field names matter only to the developers of biblatex and Zotero.

I'm just asking the biblatex developers to do something about preprints so that the Zotero developers can correctly implement the translation from Zotero's preprint type to the corresponding biblatex bib type.

At this point, the names of the types and those of the fields are more similar to the names of variables in computer programs. They are still important to the programmers, but generally, not to the community (users) any longer.

4) Finally, what names are still important to the end user (of a tool like Zotero)?

I look at an entry in Zotero. The only names I need to look at are

"preprint", which is the name of the type of the entry.
DOI, in order to click on it.
URL, ditto.

I don't look at the other field names, because if you see "Smith, John L.", it's clear it's the name of the author, if you see "On the semantics of field names", it's clear it's the title of the article, etc. Yes, editor names are sometimes involved, but editor names always come much later than the author names. Yes, the field name journaltitle may be very occasionally important for a journal whose name isn't familiar to you and doesn't look like the name of a journal.

In this sense, and in this sense only, the name preprint is superior to the others: You don't have to scan the entry to find out what it is.

All in all, however, the names are important to the end user only when s/he hand-edits the bib database.

plk commented 1 year ago

I would like to deprecate anything that is really a state of publication or a location of publication rather than an entrytype. So, @unpublished would be deprecated and @article would be promoted, with appropriate fields. I don't really care how easy it is to scan the data for a human since everything is moving to IDEs and GUIs anyway. @online is more difficult as it covers all sorts of disparate things but @article covers a lot of this (web articles). We could alias @online, which is a terrible name for a type, to @onlinemedia and have subtypes for this, for example.

Basically we are talking about a new and backwards compatible data model for biblatex which sounds like a worthwhile project to me. It can easily be described directly in the datamodel macros we already use in blx-dm.def, along with compat sourcemaps in biblatex.def since we have a facility for driver-level maps, designed for just this sort of situation. We could do a limited time compat mode for styles which use deprecated data model components.

pauloney commented 1 year ago

I fully applaud your proposal, naming things for what they are is a very good start...but would you elaborate further on the proposal? For example, what fields would identify an unpublished article if we only have @article to go by.

Paulo Ney

On Fri, Mar 24, 2023 at 2:31 PM plk @.***> wrote:

I would like to deprecate anything that is really a state of publication or a location of publication rather than an entrytype. So, @unpublished would be deprecated and @article would be promoted, with appropriate fields. I don't really care how easy it is to scan the data for a human since everything is moving to IDEs and GUIs anyway. @online is more difficult as it covers all sorts of disparate things but @article covers a lot of this (web articles). We could alias @online, which is a terrible name for a type, to @onlinemedia and have subtypes for this, for example.

Basically we are talking about a new and backwards compatible data model for biblatex which sounds like a worthwhile project to me. It can easily be described directly in the datamodel macros we already use in blx-dm.def, along with compat sourcemaps in biblatex.def since we have a facility for driver-level maps, designed for just this sort of situation. We could do a limited time compat mode for styles which use deprecated data model components.

— Reply to this email directly, view it on GitHub https://github.com/plk/biblatex/issues/1267#issuecomment-1483426821, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR7WYQK5JVN6YYRGKRXMD3W5YHCVANCNFSM6AAAAAAUOOMLYQ . You are receiving this because you commented.Message ID: @.***>

plk commented 1 year ago

I would recommend using pubstate since it's a supported field for many types, including @article. This already supports a variety of localised strings for the state and it would be trivial to add unpublished to these strings. I already use this in the APA style for "in press" and "unpublished" is just another pubstate to my mind.

tobiasdiez commented 1 year ago

While arxiv is semantically a preprint server, it is also used for various other kinds of outputs. One often finds lecture notes, books or conference articles there as well. Coming from this point of view, it its naturally to consider lecture notes with an eprint field and without a publisher as "preprint lecture notes" (that eventually might be properly published). By analogy, an article with an eprint field but without a journaltitle is a "preprint article".

So maybe the only thing one needs to change is to allow an article to not have a journaltitle, and advise styles to then treat it as a preprint.

pauloney commented 1 year ago

I can see the logic of the proposal, but I think the last thing we need is an entry type being defined by the logic of something that happens (or does not happen) inside.

There are way too many authors that try to force a certain appearance of a record. by:

improperly classifying it as something else changing the information on a record to suite some specific purpose

and we ought to help by making the semantics of it easy and straightforward.

The proposal by PLK achieves that, one classify a resource for what "it is" and the presence on the arXiv or not is marked by the fields inside.

Paulo Ney

On Wed, Apr 5, 2023 at 3:31 AM Tobias Diez @.***> wrote:

While arxiv is semantically a preprint server, it is also used for various other kinds of outputs. One often finds lecture notes, books or conference articles there as well. Coming from this point of view, it its naturally to consider lecture notes with an eprint field and without a publisher as "preprint lecture notes" (that eventually might be properly published). By analogy, an article with an eprint field but without a journaltitle is a "preprint article".

So maybe the only thing one needs to change is to allow an article to not have a journaltitle, and advise styles to then treat it as a preprint.

— Reply to this email directly, view it on GitHub https://github.com/plk/biblatex/issues/1267#issuecomment-1497267757, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR7WYRVRKNVWBUBQMNVX3LW7VCYFANCNFSM6AAAAAAUOOMLYQ . You are receiving this because you commented.Message ID: @.***>

plk commented 1 year ago

I quite like the idea of relaxing the requirement of journaltitle for article - that would solve a lot of problems.

moewew commented 1 year ago

As I said above, I'm not keen on the broadening the meaning of journaltitle thing. I feel that people have a tendency to try and use @article for basically anything that is a research paper, but at least the standard data model has a more nuanced view (there is a technical difference between a paper in conference proceedings, in a paper collection and a journal - even though practically there is very little difference). Broadening field and entry type meanings weakens the semantics, which can make it harder for style authors (who implement styles that do differentiate these entry types for whatever reasons).

plk commented 1 year ago

I also wouldn't want to broaden journaltitle to include the title of other things but I suppose it could be optional, thus broadening article which seems a lot less problematic?

moewew commented 1 year ago

Hmmm, I feel that allowing @articles not to have a journaltitle has similar repercussions to allowing journaltitle to hold other titles. It still weakens the semantics of what @article is at the moment in a way that I feel could make it tricky for style authors.

plk commented 1 year ago

It does indeed weaken the semantics of article but I think that's probably a good thing as having article be reserved for journals seems too restrictive. It's a different matter broadening journaltitle as the name of that pretty much precludes any broadening. Broadening article shouldn't hurt style authors that much as they are free to use it in a more restricted way - it would be a problem if we were restricting it rather than broadening it I suppose. I used article quite broadly in the APA style and it makes things a lot easier in general ...

ryofurue commented 1 year ago

So, what's the resolution? Whatever reasonable resolution is better than no resolution. I've just looked at the other thread about preprints in this forum and found that it was initiated in 2020. It's already 2023 now.

As time goes on, more and more bib entries are created using various ad-hoc schemes for preprints. When you biblatex developers publish an official scheme, some of the existing bib entries will become incompatible with the official scheme, which is unavoidable. But, it's better to do it now.

[I've edited this sentence. Before it was printed in big letters. I didn't realize that "---" would result in a big font. I didn't mean to "shout" at all!] Or, do you think most preprint servers will soon go out of favor and only arXiv will remain relevant? If so, to do nothing will be an option.

By the way, skimming this thread, I still don't see what the objection is to creating a new type @preprint. It would seem to me that that would solve all problems you have raised so far. We already have @techreport, which you could argue is not necessary because technical reports can be accommodated by @article by broadening its meaning.

moewew commented 1 year ago

Sorry, I haven't had a lot of time over the last few months to deal with anything but the low-hanging fruits (which this issue is most definitely not): I can't offer a good solution at the moment (and I definitely don't expect to be able to do so for the next four weeks or so). (Or rather: I haven't thought about the proposed solutions - or other possible solutions - in more detail than what I have already said in the discussion so far.)

I agree that we should try and find a good solution here, but I don't think it needs to be immediately. If we move now and release a solution that is not properly thought through and that we have to go back on we might create more damage than we do good. Users should generally be able to expect backwards compatibility. So once we have documented and released something it is tricky to remove it again. From what I can see this issue has popped up once or twice before (see for example https://github.com/plk/biblatex/issues/835, not sure if that is the discussion you refer to?), but we are not exactly bombarded with feature requests in this area.

This discussion has been comparatively lively for this bugtracker with six people joining in, but of course that is just a small fraction of the intended user base.

ryofurue commented 1 year ago

If we move now and release a solution that is not properly thought through and that we have to go back on we might create more damage than we do good.

But, damages are being created daily. You download a bib entry from a preprint server. It doesn't work with biblatex. You have to hand edit it. But you don't know what's the proper way. You search the Internet. You find multiple conflicting solutions, none of which works perfectly. You send your bib file to your publisher. They create an incorrect entry in your reference list. You have to correct it in your proof.

The damages are these wasted efforts. Until a standard emerges, these wastes are being created daily somewhere in the world. Because publishers are very slow in catching up with the norm of the LaTeX community, the sooner the better.