joostkremers / parsebib

Elisp library for reading .bib files
BSD 3-Clause "New" or "Revised" License
35 stars 9 forks source link

Move `ebib-clean-TeX-markup` to parsebib #19

Open bdarcus opened 2 years ago

bdarcus commented 2 years ago

Follow-up to:

https://github.com/bdarcus/citar/issues/535

The function doesn't reference any other ebib functions, but it does rely on ebib-TeX-markup-replace-alist, so I assume that would need to be moved as well.

But it seems a straightforward move-and-rename.

Alas, I'm not familiar enough with this codebase to know how best to then integrate it here.

Should probably ~be expanded to do~ also at some point add a parallel function that does the same for CSL JSON markup, though the use of markup there isn't really standardized ATM.

joostkremers commented 2 years ago

I think it be preferable to have a similar function for CLS-JSON, not combine both into a single function, but yeah, doing it for both formats makes sense.

joostkremers commented 2 years ago

I've now pushed updates to both parsebib and Ebib that move ebib-TeX-markup-replace-alist and related functions to parsebib.

parsebib-parse-bib-buffer now has an extra argument replace-TeX: if non-nil, all replacements in parsebib-TeX-markup-replace-alist are applied to field values. parsebib-parse applies these as well (unless you call it with its display argument set to nil).

Note that right now, if you set replace-TeX to t, all field values are passed through parsebib-clean-TeX-markup. This seems the right thing to do (there may be TeX markup in the journaltitle field, for example, or in author or editor fields), but let me know if that should be more flexible.

@Hugo-Heagren I haven't added a new user option to disable replacing TeX markup, as I suggested in https://github.com/joostkremers/parsebib/pull/21#issuecomment-1153590907, because I realised that you can customize ebib-field-transformation-functions if you want to disable it.

bdarcus commented 2 years ago

Many thanks, @joostkremers!

I will take a look, and integrate this.

On your question about flexibility, I'm not sure, so let ask:

Here's a current defcustom, which is similar to ebib-field-transformation-functions:

(defcustom citar-display-transform-functions
  '((t  . citar-clean-string)
    (("author" "editor") . citar-shorten-names))
  "Configure transformation of field display values from raw values.
All functions that match a particular field are run in order."
  :group 'citar
  :type '(alist :key-type   (choice (const t) (repeat string))
                :value-type function))

So this says first clean the string (as with this function) regardless.

And the second says to run citar-shorten-names on "author" or "editor" fields only.

Per this issue, we need to swap that order, since we need to preserve organizational authors and such..

https://github.com/emacs-citar/citar/issues/532

But beyond that, with this change, WDY recommend?

bdarcus commented 2 years ago

Actually, nevermind. When I find some time, I'll integrate, and let you know if I run into any issues.

joostkremers commented 2 years ago

On your question about flexibility, I'm not sure, so let ask:

Here's a current defcustom, which is similar to ebib-field-transformation-functions:

With flexibility I meant that perhaps you might want to be able to specify which fields parsebib-clean-TeX-markup should be to be applied to. But your citar-display-transform-functions is more general, because it's not limited to a single function.

It would actually make sense to build that into parsebib, I think. The idea would be that you can then pass the value of citar-display-transform-functions to parsebib-parse and parsebib would do the rest. Since both packages are loaded, it wouldn't matter if the functions to be applied would be from parsebib (parsebib-clean-TeX-markup) of from citar (citar-shorten-names).

We'd just have to think about what to do with bib(la)tex vs. CSL-JSON. It would make sense to generalise the transformations that parsebib already does for CSL-JSON in the same way, but we'd probably need to keep the two types separate. At best it would be a waste of CPU cycles to apply transformations for bibtex to CSL-JSON data and vice versa, at worst it would wreak havoc.

bdarcus commented 2 years ago

It would actually make sense to build that into parsebib, I think.

That's indeed what I was wondering.

joostkremers commented 2 years ago

@bdarcus Does citar support both biblatex and CSL-JSON? Do you have transform functions for the latter?

bdarcus commented 2 years ago

@bdarcus Does citar support both biblatex and CSL-JSON?

Yes; it's among the reasons I dropped the bibtex-completion deoendency and use parsebib directly.

Do you have transform functions for the latter?

Not ATM.

The only transform function I have, other than to strip TeX markup, is the shorten-names one, which isn't very general, but seems to work with both formats, without modification.

(defun citar-shorten-names (names)
  "Return a list of family names from a list of full NAMES.
To better accommodate corporate names, this will only shorten
personal names of the form 'family, given'."
  (when (stringp names)
    (mapconcat
     (lambda (name)
       (if (eq 1 (length name))
           (cdr (split-string name " "))
         (car (split-string name ", "))))
     (split-string names " and ") ", ")))

EDIT: but it doesn't currently have logic to handle corporate names; e.g. those in brackets.

If something like this would be valuable in parsebib, feel free to adapt it as you like.

bdarcus commented 2 years ago

Thinking a bit more, maybe there could be format independent transformation functions that call to format specific ones?

Like parsebib-shorten-names vs parsebib--shorten-names-tex.