Add parsebib-clean-TeX-markup

bdarcus commented 2 years ago

... along with the associated parsebib-TeX-markup-replace-alist, though converted to a defvar, since this isn't a user-facing package.

Close #19

I didn't touch ebib; figure you can remove that independently, as you wish.

Hugo-Heagren commented 2 years ago

JSYK, The ebib version of this function and associated variables changed a bit recently -- you might prefer to copy/alter that version (or you might not :shrug:).

bdarcus commented 2 years ago

OMG; that was a total oversight on my part; I wasn't paying attention to getting the current version.

Seems it changed more than "a bit" though!!

Edit: I guess a lot of that was test code though.

I'll let @joostkremers weigh on on what he wants to do.

joostkremers commented 2 years ago

I'll let @joostkremers weigh on on what he wants to do.

Heh, good question. Ideally, this stuff should only be implemented once, of course. So if we want parsebib to be able to make use of it, it should be in parsebib. I don't think there's any reason not to have it there, so in principle I'm not against it.

AFAICT it would be possible to simply move ebib-clean-TeX-markup and related code to parsebib, so that in Ebib we'd only need to change the function calls and not much else.

The question is just what to do with ebib-TeX-markup-replace-alist, which is a user option.

Normally, I'd want to keep it a user option, which means duplicating it in Ebib and parsebib and having Ebib pass its value to parsebib. This is what I did with ebib-biblatex-inheritances when I moved that functionality into parsebib.

However, in this particular case, perhaps we don't need to do so. I mean, even though ebib-TeX-markup-replace-alist is technically a user option, it's such a long and unwieldy list that I doubt any normal user will try and customise it.

So I would move ebib-TeX-markup-replace-alist to parsebib as well and turn it into a defvar. A user that really wants to customise it can always redefine the variable in their init file.

I would like to have some customisability, though, so perhaps we can introduce an option ebib-clean-TeX-markup (or some similar name), which for now is just a toggle. The (new) function parsebib-clean-TeX-markup would then have to take an additional optional argument indicating whether it should actually do its magic or not. (Eventually, I'd like to turn this option into a list of replacement types, which would roughly correspond to the different groups indicated in the definition of ebib-TeX-markup-replacement-alist, but that can wait.)

Another question is the integration of this new parsebib-clean-TeX-markup into parsebib itself. It seems to me that parsebib-parse-bib-buffer should have an additional keyword argument to indicate whether it should clean TeX markup in field values and that parsebib-parse should set this argument to the value of its own display argument.

The actual clean-up should be done in parsebib--parse-bib-value, if I'm not mistaken, which means that it should also get an additional optional argument, and so should parsebib-read-entry and parsebib--parse-bibtex-field, because they need to pass the toggle on to parsebib--parse-bib-value.

Hugo-Heagren commented 2 years ago

Another question is the integration of this new parsebib-clean-TeX-markup into parsebib itself. It seems to me that parsebib-parse-bib-buffer should have an additional keyword argument to indicate whether it should clean TeX markup in field values and that parsebib-parse should set this argument to the value of its own display argument.

The actual clean-up should be done in parsebib--parse-bib-value, if I'm not mistaken, which means that it should also get an additional optional argument, and so should parsebib-read-entry and parsebib--parse-bibtex-field, because they need to pass the toggle on to parsebib--parse-bib-value.

I support @bdarcus original motivation for including this function in parsebib (as opposed to ebib). I'm less sure about the stuff quoted above. Parsebib is a parsing library -- it's principle function is to get information written in a structured file and return it in a structured lisp object. And the information in the file has the markup in it, not the cleaned up text. joostkremers/ebib#238 and joostkremers/ebib#248 were intended to make presenting that data prettier. I'm worried that including cosmetic functionality in the parsing functions might might make it harder to access downstream.

This is a long winded way of saying that if there's a boolean option to clean up when parsing, it should be off by default.

joostkremers commented 2 years ago

I support @bdarcus original motivation for including this function in parsebib (as opposed to ebib). I'm less sure about the stuff quoted above. Parsebib is a parsing library -- it's principle function is to get information written in a structured file and return it in a structured lisp object. And the information in the file has the markup in it, not the cleaned up text. joostkremers/ebib#238 and joostkremers/ebib#248 were intended to make presenting that data prettier. I'm worried that including cosmetic functionality in the parsing functions might might make it harder to access downstream.

But parsebib already includes functionality to make the data more presentable: Returning entries for display. This is basically just an extension of that.

Ebib doesn't use any of this cleanup functionality, because it needs the exact contents of the .bib file to work with. Other packages that use parsebib don't, though, they just present the contents of a .bib file to the user. For such packages, it's more convenient to be able to pass an argument to parsebib-parse-bib-buffer and get nice display-ready results.

This is a long winded way of saying that if there's a boolean option to clean up when parsing, it should be off by default.

Well, it's up to the caller to decide which kind of cleanup parsebib should do, if any. Obviously, when Ebib opens a .bib file, it shouldn't use this option.

bdarcus commented 2 years ago

But parsebib already includes functionality to make the data more presentable: Returning entries for display. This is basically just an extension of that.

Exactly.

For context, here's the original citar issue that led to this https://github.com/emacs-citar/citar/issues/535.

The sequence is:

I start bibtex-actions as a bibtex-completion front-end, which implements a simple version of this.
I drop that dependency (and changed the project name) and use parsebib directly, and borrow that function
but it's limited, per the above issue, and then @joostkremers noted he has a pretty good one in ebib.

So it naturally seems the best place to put this code here, so that each project should rely on the same function, rather than reimplement.

But as you can tell, I didn't realize the additional complexity :-)

PS - I should also note https://github.com/emacs-citar/citar/issues/532, on name-parsing and formatting. Not sure where that would have to happen to work correctly, but probably here.

BTW, I find parsebib really useful, since I don't have to worry about low-level formatting details, and just call fields to display.

bdarcus commented 2 years ago

Also, we currently have a citar-display-transform-functions defcustom, which allows one to easily configure which string manipulation functions are run against which fields. I guess something like that could be here, or we could just use this function as one of them.

joostkremers commented 2 years ago

But as you can tell, I didn't realize the additional complexity :-)

I don't think it's that complex. I started implementing it, should have something testable in a few days.

PS - I should also note emacs-citar/citar#532, on name-parsing and formatting. Not sure where that would have to happen to work correctly, but probably here.

It would at least be in line with the "prettify for display" functionality that parsebib already has. For JSON name fields there is already something similar: https://github.com/joostkremers/parsebib#parsebib-json-name-field-template.

bdarcus commented 2 years ago

I don't think it's that complex.

I just mean for someone not familiar with the codebase :-)

I started implementing it, should have something testable tomorrow.

Great! In that case, I'll close this.

It would at least be in line with the "prettify for display" functionality that parsebib already has. For JSON name fields there is already something similar: https://github.com/joostkremers/parsebib#parsebib-json-name-field-template.

A difference, however, is CSL JSON name representation is a structured object, while bibtex is a string.

joostkremers / parsebib

Add parsebib-clean-TeX-markup #21