typst / hayagriva

Rusty bibliography management.
Apache License 2.0
290 stars 44 forks source link

CSL? #32

Closed bdarcus closed 8 months ago

bdarcus commented 1 year ago

I'm admittedly biased, given I created it, but have you considered (also) supporting CSL citation styles and JSON input format?

I'm sure there are performance and other advantages to the rust-based styles, but there are thousands of CSL styles, as well as citeproc-rs.

hcsch commented 1 year ago

Perhaps TOML (or similar more simple languages, like JSON, as mentioned above) would also be a nice alternative to YAML for this use case, since YAML is known to be rather unintuitive/counter-intuitive in too many cases (see https://noyaml.com/ for example).

kmaasrud commented 1 year ago

Using a well-defined format like CSL JSON, you'd easily be able to create a data structure that can be serialized/deserialized from any input format. This is possible already with the citeproc-rs crate.

I agree with you @bdarcus, hayagriva really should leverage CSL, as the big contendors in bibliography management (notably Zotero) are fully in on it.

@reknih, you should definitely hit up @cormacrelf and try to leverage his work wherever possible. The Rust ecosystem would definitely benefit from a proper citation processing library---I know that at least me and likely the Tectonic guys are interested!

bdarcus commented 1 year ago

I dunno; I think toml would be a poor format for this purpose, and YAML is actually pretty good. Among other things, you can validate it's files using JSON schemas.

clbarnes commented 1 year ago

citeproc-rs is possibly dead; readme calls it WIP, maintainer seems to have been active in the last couple of months but no commits to master since 2021.

kmaasrud commented 1 year ago

citeproc-rs is possibly dead; readme calls it WIP, maintainer seems to have been active in the last couple of months but no commits to master since 2021.

Not very active, no, but it's still maintained, as evidenced by the PR opened 2 months ago that fixes a bug introduced by Rust 1.67. Also, it being part of the Zotero org, one would think it has some official weight.

That being said, Hayagriva is way more elegant IMO, and it actually exists on crates.io. Adapting to support CSL JSON shouldn't be that difficult (I am considering opening a PR for it.) CSL styles is another beast, though, but one that this library should definitely aim for supporting!

bdarcus commented 1 year ago

edited for clarity

Not very active, no, but it's still maintained, as evidenced by the PR opened 2 months ago that fixes a bug introduced by Rust 1.67. Also, it being part of the Zotero org, one would think it has some official weight.

I and a group of other CSL developers and contributors talked about status and strategy in general along with the Zotero folks last Summer (summary and further discussion here), and what I gather from that and from other discussions is:

  1. Zotero created and funded the project to create a replacement for citeproc-js, that is faster, more flexible, and easier to maintain and extend
  2. it is already included in Zotero, but AFAIK is still an optional engine
  3. still, after all this time, it's not on crates.io.

I don't understand the last point (cc @dstillman) , in part because I don't do Rust, and so can't really assess the codebase.

Cormac did mention during that meeting that he has maybe been held back a bit by perfectionism, but I'm not sure if it's that, or some technical issue(s) they've run into.

My impression is they're also skeptical they'll get much in the way of quality PRs (it's not code suitable, for example, for many amateur programmers); that there's not likely a market for this among other developers.

I'm more optimistic about the prospects for a community-developed Rust-based open source CSL processor :-)

It would help for the Zotero folks to communicate more clearly about this:

  1. What the future of citeproc-rs and Zotero is?
  2. whether and how they accept PRs
  3. when (not if, because it's not really an option for a robust Rust project not to be on crates.io) they plan to release the crate
  4. Bottom line: whether they're committed to it.

Absent answers, or of course if they simply say "sorry, this was an experiment, and it won't work for us", maybe some dedicated Rust developer(s) should just fork it?

That being said, Hayagriva is way more elegant IMO

In what way(s)?

... and it actually exists on crates.io.

Right.

Adapting to support CSL JSON shouldn't be that difficult (I am considering opening a PR for it.) CSL styles is another beast, though, but one that this library should definitely aim for supporting!

I will say in general :

Final, much more speculative, point:

I created CSL around 2005, writing my first book.

I think it reflects sound decisions based on the technology landscape at that time; the decision to use XML and RELAX NG, to insist on output format independence and being language-agnostic, to make it suitable for hand-editing in schema-aware XML editors, and also subtle things like designing it in such a way that one could switch among radically different citation styles without editing document source.

Now, close to 20 years later, I am big on the idea of using things like machine learning to simply create language-independent styles from formatted output examples, so users don't have to edit styles at all.

https://github.com/inukshuk/anystyle/issues/146

I could imagine if that could be perfected, it would open the door to different sorts of output options: CSL XML initially certainly, but also maybe formats better optimized for machine processing.

Alas, I have neither the time or the skill to explore that idea!

kmaasrud commented 1 year ago

In what way(s)?

@bdarcus From what I can glean: cleaner API, smaller and easier to understand codebase, all-in-all looks more elegant. This makes sense, as Hayagriva has a narrower audience of library consumers (essentially just themselves) and is newer.

bdarcus commented 1 year ago

News on citeproc-rs.

Basically they're stalled, with labor and technical hurdles, and need help to get the code in shape and released.

Another third-party developer is going to spend some time trying to figure if and how to do that.

reknih commented 1 year ago

Hi folks! I have already considered adding CSL, it's definitely on the roadmap!

It would be nice if I did not have to reimplement a Rust parser for CSL, is citeproc-rs up to the task?

bdarcus commented 1 year ago

It would be nice if I did not have to reimplement a Rust parser for CSL, is citeproc-rs up to the task?

IDK; "csl" is one of only two crates he actually released.

Parsing is easy; it's just XML after all.

It's the processing that's difficult.

bdarcus commented 1 year ago

Dan posted another more detailed follow-up on the technical status.

There's also the excellent Haskell based version I mentioned, which can effectively act like a JSON server.

https://github.com/jgm/citeproc/blob/master/man/citeproc.1.md

bdarcus commented 1 year ago

FWIW, I've been working on an experimental evolution of CSL in a typescript model; a commented YAML file of the current state.

~As I say in the README, have no idea if this goes anywhere or not.~

Late-May update: I've made quite a bit of progress on this, and realized in the process the typescript Style model can be auto-converted to Rust code to serialize and deserialize a style.

Here's a little demo repo that demonstrates:

https://github.com/bdarcus/csln-rs

EDIT: in looking at your YAML format now, I'm seeing your defining authors as a list of people? And assuming string parsing on those to get the components? If yes, that seems to leave out org authors.

bdarcus commented 1 year ago

@reknih when you and/or your other developers have a bit free time, can you take a look at this?

https://github.com/bdarcus/csln

It's a reimplementation of the csl-next draft model in pure Rust, with very tight coupling (thanks to serde) between the JSON schema input and internal model.

I'm pretty confident in that model, though it would need more review, testing, and iteration for me to be fully happy with it.

I'm much less confident in my programming skills, and the fact I'm a complete Rust newbie.

But I'm absolutely serious about building this out. I just need some help.

It should compile fine using the cargo, and I have it licensed under Mozilla 2.0, which I think should be compatible with your Apache option; probably not MIT. But my view on licenses is as a practical open source advocate. I choose the license simply because it's the same as citeproc-rs,

It's not quite pare with the typescript processor; here's an example of where I'm at:

❯ target/debug/csln processor/examples/style.csl.yaml processor/examples/ex1.bib.yaml

Example result:

{
  "smith1": {
    "disamb-condition": false,
    "group-index": 1,
    "group-length": 1,
    "group-key": "Smith, Sam:2023-10"
  },

So the core of the processor at this point is a sorted bibliography vector, and this HashMap.

The next step is a function to iterate through the former and template, and use the latter to generate the pre-rendered AST.

https://github.com/bdarcus/csln/issues/16

reknih commented 1 year ago

Hey @bdarcus, I recently started a CSL 1.0.2 XML parser and processor with typst/citationberg. Good to know that you are working on something for the next generation of CSL! What kind of help are you looking for?

bdarcus commented 1 year ago

@reknih

I recently started a CSL 1.0.2 XML parser and processor with typst/citationberg.

Oh, cool; didn't know!

How are you finding working with the XML?

What kind of help are you looking for?

I hadn't gotten around yet (since this is newer) to sketching out milestones, but the ones for the typscript project more-or-less apply.

https://github.com/bdarcus/csl-next/milestones?direction=asc&sort=due_date

There's still a lot of work to do on the processor, for example, and we need to figure out a way to convert 1.0 styles, which may or may not be a big task.

It may be useful for you to review the model now, and think about whether there's promise in using that, and simply converting 1.0 styles to it, if we can do it fairly losslessly?

I imagine your model would help a lot with that?

And perhaps there's a way to share code between the two projects?

This is admittedly not fully-developed at this point, but I think I've thought-through enough details that it should work out as I intend.

EDIT: I did try to sketch out where I see this going in some of the crate READMEs (for example, for cli).

On a more mundane level, since I'm a mediocre amateur programmer and rust newbie (though sometimes I think this gives me certain advantages compared to trained programmers), reviews of existing code and PRs to improve would be welcome :-)

fredguth commented 9 months ago

Any news on that? Still needs help? What kind of processing is needed? What makes this task difficult? (Rush newbie)

reknih commented 9 months ago

Thank you for asking. The following tasks are still open:

bdarcus commented 9 months ago

What will be relationship between hayagriva and citationberg?

reknih commented 9 months ago

Citationberg parses CSL but makes no assumptions about how variables and data types are expressed within the consumer. Hayagriva will have a frontend to enter bibliographic information and be the CSL processor.

reknih commented 8 months ago

This has been shipped with 0.4.0