typst / hayagriva

Rusty bibliography management.
Apache License 2.0
313 stars 47 forks source link

Citation label issues #118

Open ramojus opened 9 months ago

ramojus commented 9 months ago

I've noticed two strange things with citation labels:

  1. When there are more than three authors, only the letters of the first author are taken. E.g, if I set year value to 2002 and author value to A. Surname and B. Surname and C. Surname and D. Surname (in .bib format) I get Sur+02 as citation label. I think it would make more sense if it would generate the label as SSS+02 (by taking the first surname letter from the first three authors).
  2. If the labels of two citations match, there is no way to distinguish between them. I've seen solutions where unique letter is added at the end: Sur02a, Sur02b.
DerDrodt commented 9 months ago

That depends entirely on your citation style. The style usually also sets how labels are disambiguated. What style are you using?

ramojus commented 9 months ago

Sorry, that was misleading. I'm trying to write a custom csl for typst bibliography and csl has builtin variable citation-label. This is what I'm talking about.

Looking into this https://github.com/typst/hayagriva/blob/main/src/csl/citation_label.rs, the first problem should be very easy to fix, but I'm not sure about the second one.

cpg314 commented 3 months ago

If the labels of two citations match, there is no way to distinguish between them. I've seen solutions where unique letter is added at the end: Sur02a, Sur02b.

Example

Here is a minimal example of this in typst

test.bib

@article{knuth1,
  title={Literate Programming},
  author={Donald E. Knuth},
  journal={The Computer Journal},
  year={1984},
  publisher={Oxford University Press}
}

@article{knuth2,
  title={Literate Programming II},
  author={Donald E. Knuth},
  journal={The Computer Journal},
  year={1984},
}

main.typ

#set cite(style: "alphanumeric")
@knuth1 @knuth2
#bibliography("test.bib")

which renders as: image

The citations are considered as ambiguous here https://github.com/typst/hayagriva/blob/main/src/csl/mod.rs#L211 but we break there https://github.com/typst/hayagriva/blob/main/src/csl/mod.rs#L252 We skip in particular disambiguate_year_suffix (which would give Knuth84a and Knuth84b) because i.rendered.get_meta(ElemMeta::Date) is None. Probably because we use the citation-label variable which does not set this metadata?

Fix attempt 1

I added the year-suffix disambiguation to alphanumeric.csl

-  <citation collapse="citation-number" after-collapse-delimiter="; ">
+  <citation collapse="citation-number" after-collapse-delimiter="; " disambiguate-add-year-suffix="true">

     <layout prefix="[" suffix="]" delimiter=", ">
       <text variable="citation-label" />
+      <text variable="year-suffix"/>
     </layout>

and ran the tests to update alphanumeric.cbor.

After disabling the test in disambiguate_year_suffix, I get the citations rendered nicely as Knuth84a, Knuth84b, Knuth85, but the suffixes end up at the end of the citation in the bibliography rather than in the keys: image

This looks like https://github.com/typst/typst/issues/2707 and https://github.com/typst/hayagriva/commit/22433f0cdbb3a26ddf94817c770b7b2469c08911. For the bibliography, the citation number is replaced by the label based on usage in 'resolve_number_variable`, which I edited as

                         )
                         .map(|c| {
                             NumberVariableResult::Regular(MaybeTyped::String(
-                                c.to_string(),
+                                c.to_string()
+                                    + &self
+                                        .instance
+                                        .resolve_standard_variable(
+                                            LongShortForm::default(),
+                                            StandardVariable::YearSuffix,
+                                        )
+                                        .map(|x| x.to_string())
+                                        .unwrap_or_default(),
                             ))
                         });
                 }

Fix attempt 2

At this point, from my shallow understanding of the CSL and the hayagriva internals, I have the feeling that adding the year-suffix to the <citation> was the wrong way to go and that perhaps the citation label is meant to automatically have the year suffix when disambiguation if needed.

The following works: removing the check in disambiguate_year_suffix and modifying resolve_standard_variable in https://github.com/typst/hayagriva/blob//src/csl/taxonomy.rs#L74 to

                     None
                 }
             }
+            StandardVariable::CitationLabel => self
+                .entry
+                .resolve_standard_variable(form, variable)
+                .map(|x| {
+                    x.to_string()
+                        + &self
+                            .resolve_standard_variable(form, StandardVariable::YearSuffix)
+                            .unwrap_or_default()
+                            .to_str()
+                })
+                .map(|x| Cow::Owned(ChunkedString::from(x))),
             _ => self.entry.resolve_standard_variable(form, variable),
         }
     }

Then the suffixes show both in the citation keys and in the bibliography.

To know the right way to solve this, I should read more about CSL...