zotero / citeproc-rs

CSL processor in Rust.
https://cormacrelf.github.io/citeproc-wasm-demo/
Other
72 stars 11 forks source link

Release on crates.io? #61

Open efoerster opened 4 years ago

efoerster commented 4 years ago

Thanks for the great project. We are using citeproc-rs in texlab and would like to publish texlab on crates.io soon, see https://github.com/latex-lsp/texlab/issues/152. However, this would require citeproc-rs to be available on crates.io, too. Do you have plans on making a release on crates.io?

0-wiz-0 commented 3 years ago

I second this - please provide a crate on crates.io, to allow easier packaging for texlab.

cormacrelf commented 3 years ago

Yes, I plan to do this very soon. I was holding off because of the messiness of releasing internal crates as well. But a solution emerged when reading a rust_analyzer CI script, which is to add an extra crate name prefix during publishing so people can see very obviously that they’re internal.

carlosala commented 3 years ago

Hey! I also think would be great to do it, if you need some help to do it just say it!

carlosala commented 3 years ago

Hi @cormacrelf! Are you finally planning to release it to crates.io? It would help a lot to the texlab! Thanks!

bdarcus commented 3 years ago

I don't use texlab (though looks cool), but seems like them publishing their cargo is blocked by citeproc-rs not being available.

https://github.com/latex-lsp/texlab/issues/399

Just out of curiosity, how does texlab use citeproc-rs?

Antigravityd commented 2 years ago

Any updates on this? Crates.io support for any library makes packaging on Guix trivial, and I'm really missing my latex completions...texlab appears to use this to parse BibTeX files to correctly complete citation names.

0-wiz-0 commented 2 years ago

I just noticed that texlab stopped using this crate: https://github.com/latex-lsp/texlab/commit/bfcfff518f6012ca5ed5398d66e3196d4bcb8808

bdarcus commented 2 years ago

This really needs to be a higher priority. It's now been more than two years since the OP opened this issue!

Other developers/projects aren't going to rely on, or contribute to, citeproc-rs if it's not available via crates.io.

NilsIrl commented 2 years ago

I just noticed that texlab stopped using this crate: latex-lsp/texlab@bfcfff5

And it is now published on crates.io

kmaasrud commented 1 year ago

@cormacrelf what is the current status? As soon as the 1.67 lifetime issue is fixed, wouldn't you consider this ready for a release on crates.io now?

bdarcus commented 1 year ago

@kmaasrud - just curious, and I have no insight into the status, but are you wanting to use citeproc-rs with djoc?

https://github.com/kmaasrud/djoc

Cool to see that, BTW!

kmaasrud commented 1 year ago

@kmaasrud - just curious, and I have no insight into the status, but are you wanting to use citeproc-rs with djoc?

@bdarcus yes exactly! CSL is no doubt the future of citation processing, and while there exists a LaTeX package for it, it is written in Lua and thus not supported by Tectonic/XeTeX (which I'm using as my LaTeX backend.)

I also want the binary to be self-contained, which is not possible when using BibTeX (embedding biber would be a nightmare.)

bdarcus commented 1 year ago

@kmaasrud cool; let me know if and when you want feedback.

zepinglee commented 1 year ago

... while there exists a LaTeX package for it, it is written in Lua and thus not supported by Tectonic/XeTeX (which I'm using as my LaTeX backend.)

@kmaasrud Technically speaking, it also works with XeTeX by replacing the bibtex (or biber) command-line procedure with citeproc-lua.

kmaasrud commented 1 year ago

@kmaasrud Technically speaking, it also works with XeTeX by replacing the bibtex (or biber) command-line procedure with citeproc-lua.

@zepinglee Indeed, but then the external dependency only shifts from biber over to citeproc-lua. Lua would definitely be easier to embed than biber though, so I might need to look into that.

However, given that such a comprehensive crate like this exists, it'd be a shame to depend on Lua...

bdarcus commented 1 year ago

I ended up writing up a long comment related to this thread over here, a related Rust-based citation processor (with a crate!):

https://github.com/typst/hayagriva/issues/32#issuecomment-1482733264

urschrei commented 1 year ago

I'll chime in here and note that I've been keeping an eye on citeproc-rs for a while, as both a long-time contributor to the Zotero ecosystem (I'm the author and maintainer of Pyzotero) and a long-time Rust user (I've been publishing and maintaining widely-used crates since Rust 1.0 in 2015).

I would be interested in contributing / helping to maintain the crate, but a couple of things are holding me back:

  1. As @bdarcus notes it's not really clear to me what needs to happen before someone is willing to cut a release;
  2. Its status as a component of Zotero, and the Zotero team's plans for it – I recall that there was a perf gap, which I found highly surprising, but I couldn't dig into it due to a compilation bug which I gather is now resolved.

In summary: I'm not interested in contributing to something that doesn't have a future as part of Zotero, but if it does, by all means let me know if you're interested in contributions (cc @cormacrelf @dstillman)

dstillman commented 1 year ago

So I think this is a bit of a chicken-and-egg problem.

We don't currently have anyone able to work on citeproc-rs or even to assess patches — I could click the merge button on PRs but I wouldn't really know what I was accepting. Given the performance issues we saw and various other blockers (an incomplete list, I'm sure), and the relative completeness and suitability of citeproc-js, we're not particularly inclined to invest more in this project in order to determine if it could ever be a suitable replacement in Zotero. And without a clear future in Zotero, I'm not particularly inclined to publish it to crates.io under our name, nor do I think it would really make sense for anyone to do so without the project being under active development. As far as I know citeproc-rs won't currently even parse many/most current CSL styles, since it doesn't have full CSL 1.0.2 support, so it's likely of pretty limited use to anyone at this point.

If someone from the community was willing to help address the remaining issues and get citeproc-rs to the point where it was clear that there was a path to using it as the default processor in Zotero, we would be open to putting more resources towards its continued development as well as its health as an open-source project. But I don't know if anyone would be willing to do that without a stronger guarantee that it was going to end up in Zotero, and that's just not something we can promise at this point.

So I'm not sure how we get past that. @urschrei's offer to contribute is much appreciated, but I'm not clear if that's just about the crate or the processor more generally, and the latter is obviously a much bigger lift. There's no shortage of love for Rust, but I'm not sure that translates into love for CSL processor development.

As it is, other than the occasional issue (some of which we could work around in Zotero if we had to), citeproc-js mostly just does what we need, even on platforms like iOS where we need to call out to JS.

Sorry I don't have a better answer here.

urschrei commented 1 year ago

Thanks Dan, that's helpful. Maybe a sensible approach here is a strictly time-bounded attempt on my part to get to grips with the codebase and see whether full CSL 1.0.2 support is possible without a significant time investment, since there seems to be little point in proceeding otherwise.

bdarcus commented 1 year ago

I agree, that's helpful. Not an ideal scenario, but you explain it well, and give developers an option.

Seems like someone, or some group, needs to get the codebase in solid enough shape that it justifies further investment, and release on crates.

I'm curious where the problem lies, given other processors have been successfully developed by single developers.

bdarcus commented 1 year ago

Maybe a sensible approach here is a strictly time-bounded attempt on my part to get to grips with the codebase and see whether full CSL 1.0.2 support is possible without a significant time investment ...

On this, I should emphasize that the changes in that release are trivial; mostly things like new variable names.

So if there's a problem doing this:

I am guessing the bigger issue is the performance issues Dan mentioned (and which also seems surprising).

@urschrei it might be worth looking at the even newer Haskell citeproc, if you are able to identify any processing bottlenecks and looking for ideas? That was a clean rewrite of an earlier implementation, and supposedly significantly faster.

dstillman commented 1 year ago

Actually supporting 1.0.2 terms should be trivial. The larger issue there is citeproc-rs currently not accepting unknown input, which isn't really appropriate for our use case for a number of reasons.

For performance, there are two concerns: pathological cases, which hopefully can be easily addressed with some optimization, and more fundamental problems with WebAssembly performance, which wouldn't reflect on the processor itself but would affect the Zotero desktop use case (though we might be able to just run it as a separate binary). We need to evaluate the WebAssembly performance in the Zotero 7 dev build, but it will be easier for us and others to do that once citeproc-rs can parse 1.0.2 styles.

bdarcus commented 1 year ago

Would #13 be a potential alternative to webassembly, assuming the other details can be sorted out?

Would be cool if such a thing were compatible with the jsons served by https://github.com/jgm/citeproc/blob/master/man/citeproc.1.md; a standard JSON citation/bibliography API (see OpenAPI).

bdarcus commented 1 year ago

Not directly related to rust, but ...

I had an idea recently for a possibly radically simplified, extensible, next-gen CSL.

https://github.com/bdarcus/csl-next

I've sketched out the idea in a typescript model (which converts to JSON schema), but while my skills in lisp are not bad, and I've previously worked with python and ruby (and my original CSL prototype was XSLT!), I'm a total newbie with typescript and js.

If anyone here has those skills and might be interested in helping me assess the viability of the idea, I'd welcome the help.

bdarcus commented 1 year ago

I've made some progress on the typescript model, so today decided to see what I could do with auto-generated code derived from it.

Using the ~450 LOC of auto-generated Rust code from quicktype, with this ...

fn main() {
   /// read the example json style file
   let json = fs::read_to_string("src/style.csl.json")
       .expect("Unable to read file");

   /// deserialize the json to Rust Style struct
   let style: Style = serde_json::from_str(&json).unwrap();

   /// convert `style.title` back to a string
   println!("{}", serde_json::to_string(&style.title).unwrap());
}

... deserializes the JSON example style file to a Rust Style struct, and then serializes the title, with the result APA.

The compiled binary will actually fail if the input style isn't valid!

That seems potentially really useful, and maybe a way forward for CSL in general; a new, more forward-looking model and reference implementation, whose model can be autoconverted not only to JSON Schema, but also to a wide range of implementation languages, with Rust, Swift, Haskell, and Go being the most relevant.

EDIT: a little demo repo of the codegen.

https://github.com/bdarcus/csln-rs

bdarcus commented 1 year ago

Another update, very obviously rust-related.

https://github.com/bdarcus/csln

It's a reimplementation of the csl-next draft typescript model in pure Rust, with very tight coupling (thanks to serde) between the JSON schema input and internal model.

I'm pretty confident in that model, though it would need more review, testing, and iteration for me to be fully happy with it.

I'm much less confident in my programming skills, and the fact I'm a complete Rust newbie.

But I'm absolutely serious about building this out. I just need some help.

It should compile fine using the cargo, and I have it licensed under the same terms as citeproc-rs.

It's not quite pare with the typescript processor; here's an example of where I'm at:

❯ target/debug/csln processor/examples/style.csl.yaml processor/examples/ex1.bib.yaml

Example result:

{
  "smith1": {
    "disamb-condition": false,
    "group-index": 1,
    "group-length": 1,
    "group-key": "Smith, Sam:2023-10"
  },

So the core of the processor at this point is a sorted bibliography vector, and this HashMap.

The next step is a function to iterate through the former and template and use the latter to generate the pre-rendered AST.

bdarcus commented 1 year ago

PS - just learned the typst folks are working on a 1.0 processor in Rust.

https://github.com/typst/citationberg

Feels like maybe there needs to be some collaboration across these projects.

bdarcus commented 1 year ago

Now that I've learned a bit of Rust so I can better understand this code base, I do think something like what I'm doing in csln and this could be aligned.

The idea is really a new input model, where the schemas are generated from the code, and so they are tightly-aligned.

Oh and removing a lot of unnecessary logic from the template language.

But it seems to me the processing model is generally pretty sound, with lots of performance optimizations. Not sure from my brief review what the bottlenecks could be, but I suspect they're resolvable.

bdarcus commented 1 year ago

I've forked the repo over at the CSL org, and applied Cormac's 1.67 branch, so at least it (partially) compiles :-)

https://github.com/citation-style-language/citeproc-rs

But there are 125 clippy warnings, and the citeproc-io doesn't build, and I don't myself, with my newbie skills, know how to fix it all.

I still think it'd likely be easier and more future-proof to merge what I'm doing with some of what's in this code base.

bdarcus commented 1 year ago

The typst folks are just about to merge CSL 1.0 support in their Hayagriva library.

https://github.com/typst/hayagriva/pull/66

It relies on their parser library:

https://github.com/typst/citationberg