typst / citationberg

A library for parsing CSL styles.
Apache License 2.0
33 stars 7 forks source link

Change `PageRangeFormat::format` to follow Pandoc #7

Closed DerDrodt closed 4 months ago

DerDrodt commented 8 months ago

(This is a big change; I apologize.)

This PR changes formatting of page ranges entirely. In summary

  1. PageRangeFormat::format no longer accepts start and end parameters of type i32 but &str to handle prefixes better.
  2. A range may now have a prefix. E.g., 8n11564-8n1568 is formatted to 8n11564–68 by Chicago15. (The prefix must be identical for start and end of the range!)

I followed the Haskell implementation of pandoc pretty closely, while trying to be as efficient as possible and writing it like one would in Rust. I would not be surprised if it can be optimized further.

Three points should be cleared up before merging:

~This would be the second API change to citationberg. If merged, we should have a new version.~

reknih commented 7 months ago

I share your preference to keep the user-specified separator. With Hayagriva's new test allow-list, we can explicitly opt out of some of citeproc's tests.

I see the boundary between citationberg and hayagriva as follows: The former implements CSL features in a way that does not make assumptions about the data we want to format while hayagriva comes with its own data model and the implementation needed to format it. I think that parsing our user input is still format-specific and should be handled in hayagriva. Do you agree?

DerDrodt commented 7 months ago

I share your preference to keep the user-specified separator. With Hayagriva's new test allow-list, we can explicitly opt out of some of citeproc's tests.

That makes sense. In that case, we should have our own tests in the same format to test the new behavior.

I also found that according to the CSL docs we should only replace hyphens if the attribute was set. There also is no mention of a default for page-range-format. Should we remove our default?

I see the boundary between citationberg and hayagriva as follows: The former implements CSL features in a way that does not make assumptions about the data we want to format while hayagriva comes with its own data model and the implementation needed to format it. I think that parsing our user input is still format-specific and should be handled in hayagriva. Do you agree?

I agree. My proposal is this:

  1. Hayagriva handles splitting the input into multiple ranges or single pages (e.g., 1-5, 6-12 & 18 is turned to [1-5, 6-12, 18].
  2. A range in hayagriva consists of a start and end page, with each page having an optional prefix.
  3. Citationberg handles the formatting of a single range or page like in this PR.

That seems to be the desired behavior.

reknih commented 7 months ago

I apologize for my late answer! If the CSL spec does not want us to remove hyphens if the attribute isn't set we shall not. I think that the page-range-format default, however, should be kept as styles may rely on it given that citeproc.js is the biggest CSL consumer.

Your proposal looks good to me!

DerDrodt commented 4 months ago

I updated this PR. I am preparing a PR for hayagriva. When this is merged, I can use it in hayagriva to incorporate the things we discussed.

reknih commented 4 months ago

Thank you! Do you need a breaking citationberg release right away or do we expect more changes?

DerDrodt commented 4 months ago

No need for a new release. It is already set to use the git repo.