Closed cole-miller closed 1 year ago
None of these require more than sndjvu_codec::bzz
:
Will also need some example documents to test-drive these…
I've been thinking about the design of a sndjvu-extract
tool. Something like
$ sndjvu-extract -S SELECTOR <input.djvu
where SELECTOR
specifies a single chunk using a syntax like file_id/chunk_id#index
. With some additional options to deal with decoding, etc.
There's actually some code in the sndjvu_toolkit crate now, hooray! The idea is to have one binary that does argv[0]
dispatch to determine what tool to run. (Eventually you'll be able to compile a binary with only the subset of tools you care about, controlled by features.) Tools that have at least a little code are sndjvu-bzz
and sndjvu-dump
. I'd like the second of these to support plain (like djvudump
), XML (like djvuxml
), and JSON output, and S-expressions would be nice too :).
I wrote a working prototype of sndjvu-dump
("plain" output only) that printed its output line by line while Visit
ing the document. This has a couple of nice properties: if there's a parse error you still see all the preceding lines of output, and you can re-use the same BZZ output buffer for almost all the decoding (except you need a separate buffer for the DIRM
stuff). But for the more "structured" output it seems clear that we need to parse the document completely into a proper data structure (sndjvu::simple_document::Document
) and then walk that, instead. Maybe the original, eager sndjvu-dump
will come back as a separate tool -- could be useful.
DjVuLibre comes with a bunch of command-line tools that perform encoding/decoding, dump document structure in human-readable or XML formats, extract chunks, etc. Replicating these will be a great way to exercise the APIs and get something useful out the door.