plk / biblatex

biblatex is a sophisticated bibliography system for LaTeX users. It has considerably more features than traditional bibtex and supports UTF-8
507 stars 116 forks source link

Add fields for @software based on CFF #1169

Open tobiasdiez opened 3 years ago

tobiasdiez commented 3 years ago

Recently, and in particular with the Github integration, the citation file format got popular for specifying metadata for software and/or code. According to twitter responses, Github is also considering to provide a similar functionality that uses a bibtex file to provide the same kind of metadata, and/or to export to bibtex if the metedata is provided in CFF. For these reasons, and to facilitate citing software in ones papers, it would be good in my opinion if biblatex's software type and CFF are defining compatible standards. This would make automatic translation between the different formats straightforward. Most metadata fields in CFF have a corresponding field in biblatex, but not all. For example, 'version', 'commit', 'license' and 'repository' are missing in biblatex, see https://github.com/citation-file-format/citation-file-format/blob/main/schema-guide.md#index for an overview of all fields.


Background: We at JabRef are currently faced with the issue of importing CFF into bib(la)tex, and are unsure how to treat the metadata information with no equivalent fields. See https://github.com/JabRef/jabref/pull/7946 for work in progress.


Maybe the maintainers @sdruskat, @hainesr and @jspaaks of CFF have further input. Refs https://github.com/citation-file-format/ruby-cff/issues/48 and https://github.blog/2021-08-19-enhanced-support-citations-github/

hainesr commented 3 years ago

For conversion between citation formats it would be worth pulling @mfenner into this thread too 👋

sdruskat commented 3 years ago

Hi, thanks for bringing us in.

Perhaps two initial comments: there is https://www.ctan.org/tex-archive/macros/latex/contrib/biblatex-contrib/biblatex-software ("the reference biblatex implementation of a bibliography style extension that includes software-specific BibTeX entries"), but I'm not familiar enough with biblatex to say what the connection of these would be. Perhaps @rdicosmo (author of biblatex-software) can help out here?

I gather that biblatex is the base implementation upon which other packages build? If so then I agree, it'd be great to update that to include software-specific fields for a better support for software/@software :+1:.

Background: We at JabRef are currently faced with the issue of importing CFF into bib(la)tex, and are unsure how to treat the metadata information with no equivalent fields. See JabRef/jabref#7946 for work in progress.

For CFF, we have decided to support very little more than just basic and advanced citation use cases, and I think for JabRef it'd be fine to drop any extra information that is not used for citation purposes as well? But happy to talk further in https://github.com/JabRef/jabref/pull/7946.

rdicosmo commented 3 years ago

Thanks @sdruskat for the ping, and for pointing out the biblatex-software package, that addresses the needs for citing software in biblatex.

biblatex-software adds the relevant fields for software that are missing in the stock biblatex, and provides 4 different entries @software, @softwareversion, @softwaremodule and @codefragment, to enable citation of a software project (e.g. Scikig-Learn), a version (e.g. OCaml 4.09), a module in a modular software project (e.g. Voronoi diagrams 1.0 in CGal 3.02), or a code fragment (e.g.: the core mapreduce algorithm in Parmap 1.0).

A special property of biblatex-software is that it is a "style extension", designed to add support for the software related entries and fields to any existing biblatex style. This allows to use it exactly as if these entries and fields were part of the stock biblatex, without requiring changes to the biblatex itself (see the documentation for the details).

biblatex-software has been on CTAN for over a year, is part of TeXLive, and has already undergone several iterations following feedback from the user community.

My kind suggestion is to use biblatex-software for handling software related entries, and when it will be fully stable, we can propose to incorporate it upstream in biblatex.

Feel free to contact me for any question you may have about it.

plk commented 2 years ago

biblatex-software is a very nice example of how to extend biblatex for specialist areas. It would be nice perhaps to make such extensions "official" by mentioning them in the manual, which might help adoption for other formats requiring specialist interfaces. What do you think @moewew?

moewew commented 2 years ago

Just to have the link to my comment to a similar question in https://github.com/plk/biblatex/issues/1106#issuecomment-1220282384

plk commented 1 year ago

Is there any consensus that biblatex-software covers the CFF requirements? If so, I would suggest to close this. I don't think we are going to merge styles into the core of biblatex as this opens a whole can of worms. We can simply say that for CFF field support, load bibaltex-software?

rdicosmo commented 1 year ago

Is there any consensus that biblatex-software covers the CFF requirements? If so, I would suggest to close this. I don't think we are going to merge styles into the core of biblatex as this opens a whole can of worms. We can simply say that for CFF field support, load bibaltex-software?

I would slightly prefer to see biblatex-software merged, as it would nicely complete the official @software entry that is now only an alias to @misc today.

But I also see the concern about demands creeping in, and thanks to the great architectural structure of biblatex, biblatex-software is a "style extension" that can be added to mostly any existing style.

I think we can definitely live with this if there a clear statement in the biblatex manual pointing to biblatex-software for full fledged support of software citation: that would avoid seeing issues like this popping up over and over again over time, with potentially diverging implementations.

Do you think this is possible?

tobiasdiez commented 1 year ago

+1 for merging it. Software/apps are used pretty universal across fields and citing them in a proper way is important.

moewew commented 1 year ago

I think biblatex-software is pretty heavy. It comes with three new entry types and a number of new fields. I have the feeling that for the average user the current @software with its slightly more pedestrian approach suffices. I don't doubt that communities where software is cited (and discussed) heavily might have additional needs, but I'm not too sure if we really need to cater for everyone to that degree in the standard styles.

We have to balance the interests of completeness of the data model against simplicity of the standard styles, because the standard styles are supposed to be a basis for third-party styles. If we overload it with specific stuff that can make it harder for style authors to find their way round the code.


I feel that too many people get hung up on the alias thing (as in "@software is only an alias for @misc"). See also the lengthy discussions in https://github.com/plk/biblatex/issues/753 and linked posts. We have explicitly updated @software in the documentation and made the aliasing more explicit in the code. In any case, @software is valid for all type-specific options.

plk commented 1 year ago

I agree with @moewew here - just because there is a comprehensive type or audience specific package, I don't think it belongs in biblatex core. The modular approach is much cleaner.

tobiasdiez commented 1 year ago

As a compromise, would it be an option to integrate the main @software from biblatex-software, and leaving the more specialized types @softwareversion, @softwaremodule and @codefragment in a separate package. Ideally, there should be only one "software" type, and this should be mostly compatible with cff. In particular, fields like version and repository are rather important if you cite software (e.g. published here on github).

moewew commented 1 year ago

version is already supported. We don't have a repository field, but there is the generic url, which will probably be enough for simple use cases.

rdicosmo commented 1 year ago

I fully understand the desire of not growing the surface of biblatex, and I am a great fan of its modular structure. It would be unfortunate, thouth, if the decision to not integrate biblatex-software upstream would be taken based on the argument that citing software is just a specific need of a small academic community. We finally have objective data showing that the use, creation and sharing of software is widespread in all research fields, thanks to a monitoring effort put in place by the french ministry of research that you can access here https://frenchopensciencemonitor.esr.gouv.fr/software/general?id=general.utilisation (the disciplinary breakdown is here https://frenchopensciencemonitor.esr.gouv.fr/software/fields?id=disciplines.utilisation)

moewew commented 1 year ago

I guess my argument is not that software citations are uncommon enough that we don't have to worry about them. We do have a @software entry type, after all. My argument is that for many (most) use cases the status quo probably suffices. It's the fields and entry types of biblatex-software that go beyond the biblatex standard data model that seem to me to be of more niche interest.

I couldn't find a lot of bibliography/citation styles that have proper guidance for software citation, but taking APA style as an example, I believe we can already do what it wants (weirdly I couldn't find an example on the APA webpage, so I'm linking to a third-party interpretation, which I hope is accurate: https://libraryguides.vu.edu.au/apa-referencing/7DatasetsSoftwareTests).

zepinglee commented 1 year ago

I couldn't find a lot of bibliography/citation styles that have proper guidance for software citation, but taking APA style as an example, I believe we can already do what it wants (weirdly I couldn't find an example on the APA webpage, so I'm linking to a third-party interpretation, which I hope is accurate: https://libraryguides.vu.edu.au/apa-referencing/7DatasetsSoftwareTests).

There is also the Vancouver style https://www.nlm.nih.gov/bsd/uniform_requirements.html (§44).

rdicosmo commented 1 year ago

I guess my argument is not that software citations are uncommon enough that we don't have to worry about them. We do have a @software entry type, after all. My argument is that for many (most) use cases the status quo probably suffices. It's the fields and entry types of biblatex-software that go beyond the biblatex standard data model that seem to me to be of more niche interest.

Thanks for clarifying this :-)

I couldn't find a lot of bibliography/citation styles that have proper guidance for software citation, but taking APA style as an example, I believe we can already do what it wants (weirdly I couldn't find an example on the APA webpage, so I'm linking to a third-party interpretation, which I hope is accurate: https://libraryguides.vu.edu.au/apa-referencing/7DatasetsSoftwareTests).

Well, the point is that for a very long time software has not been considered a research output on par with publications, so we traditionally cited the documentation or the article describing the software, and not the software itself. In some cases one could see software assimilated to a book on a shelf, as it came in a box (see the software entries for Zotero for example). The landscape has changed significantly with the growth of Open Source and the very recent raising awareness about the importance of valuing software output for the career of researchers and engineers in academia.

Only very recently the need to cite software directly, and not via proxies like articles or books, came out, so one does not find satisfactory guidelines for citing software in mainstream styles yet. There has been work to improve on the status quo, but it is either very generic, or plagued by a tendency to force software in some kind of "bed of Procustes" to cater to the need of publishers (that only want to see DOIs), or to mindsets that conflate software with data (which it is not).

This is why some four years ago we set up a software citation working group at Inria, bringing together a broad panel including top researchers that have developed and maintained a variety of significant research software for decades, to come up with a concrete proposal covering the needs of a large spectrum of software developments. The outcome has many facets: on one side, an article about software attribution and reference in CiSE 2020 (green open access here) that presents among other things a taxonomy of contributor roles which is important for software metadata; on the other side, the (much smaller) data model for software citation, that was eventually implemented in biblatex-software as a style extension thanks to biblatex's wonderful modular architecture.

We did try to keep the new fields to a minimum, but there are a few needed ones, and the various entries in biblatex-software are there to make sure that the various forms of software projects can be accomodated. For example, the @softwaremodule entry (that was a surprise to me) is needed to properly cite modules/plugins like the ones that are found in the Computational Geometry Algorithms Library (CGAL): they have been using the @book and @incollection entries for citing the library and its components (see their bibtex file here), and we want to make sure they can do the same using @software and @softwaremodule, using the proper crossref mechanism for field inheritance, etc.

I believe that this may look like a niche need, but there is a tidal wave coming, for which we need to prepare.

Sorry for the long message, but I just realised that we never took the time to write down the story of how all this came up (there is a bit in the biblatex-software documentation, but not enough), and I got carried away :-)

moewew commented 1 year ago

I believe that this may look like a niche need, but there is a tidal wave coming, for which we need to prepare.

So why not think of biblatex-software as a dyke to protect us from that wave? I suggest we stick with what we have in biblatex for now and let users rope in biblatex-software if they need it.

I once read that with software features "no is temporary, yes is forever". If we add in all of the biblatex-software data model now, we're stuck with it and have to maintain it even if it turns out only a small minority of people actually uses the advanced features. If we stick with the status quo, people with simple software citation needs (including APA and Vancouver, thanks for the reference @zepinglee) can use the standard data model. Those who need more can load biblatex-software. Once it turns out that lots of people need the extended data model that biblatex-software offers and we are flooded with requests for that, we can always reconsider adding it in.

rdicosmo commented 1 year ago

Indeed, I developed biblatex-software instead of opening an issue here precisely because it did not seem reasonable to propose a change to biblatex before doing appropriate field testing. Now, after three years in CTAN, we are rather confident that we have all that we need, and I feel it would be ok to propose a merge upstream: it is easier for users to have the functionality out of the box instead of loading an extra package, and the merge process would probably allow to remove quite a bit of glue code.

But I am also perfectly fine with maintaining biblatex-software separately for the moment.

May I suggest that somewhere in the documentation a pointer to biblatex-software is added to help the users? Or, if this goes against policy, that a point is made somewhere to make sure that if an evolution of the @software entry happens, it stays compatible with what biblatex-software does?