sndjvu / workspace

monorepo for SnDjVu's Rust code, website, etc.
https://www.sndjvu.org
Apache License 2.0
6 stars 1 forks source link

Tools suite (umbrella issue) #3

Closed cole-miller closed 1 year ago

cole-miller commented 3 years ago

DjVuLibre comes with a bunch of command-line tools that perform encoding/decoding, dump document structure in human-readable or XML formats, extract chunks, etc. Replicating these will be a great way to exercise the APIs and get something useful out the door.

cole-miller commented 3 years ago

None of these require more than sndjvu_codec::bzz:

Will also need some example documents to test-drive these…

cole-miller commented 3 years ago

I've been thinking about the design of a sndjvu-extract tool. Something like

$ sndjvu-extract -S SELECTOR <input.djvu

where SELECTOR specifies a single chunk using a syntax like file_id/chunk_id#index. With some additional options to deal with decoding, etc.

cole-miller commented 2 years ago

There's actually some code in the sndjvu_toolkit crate now, hooray! The idea is to have one binary that does argv[0] dispatch to determine what tool to run. (Eventually you'll be able to compile a binary with only the subset of tools you care about, controlled by features.) Tools that have at least a little code are sndjvu-bzz and sndjvu-dump. I'd like the second of these to support plain (like djvudump), XML (like djvuxml), and JSON output, and S-expressions would be nice too :).

I wrote a working prototype of sndjvu-dump ("plain" output only) that printed its output line by line while Visiting the document. This has a couple of nice properties: if there's a parse error you still see all the preceding lines of output, and you can re-use the same BZZ output buffer for almost all the decoding (except you need a separate buffer for the DIRM stuff). But for the more "structured" output it seems clear that we need to parse the document completely into a proper data structure (sndjvu::simple_document::Document) and then walk that, instead. Maybe the original, eager sndjvu-dump will come back as a separate tool -- could be useful.