Open killercup opened 5 years ago
It's worth noting that rustdoc doesn't just want to append to an archive, but also to update files that already exist in the archive...
For context: When Cargo runs a cargo doc
command, it invokes rustdoc multiple times on the same output directory, once for each dependency. This allows it to update a handful of shared files - the search index, the new source files index, the shared CSS/JS/font resources - so that the whole dependency tree can act like a single unit. The important piece here is that we need to be able to read in the existing search index (for example), add in the records for the crate being documented, and save it back into the archive.
If i understand the current format correctly (note: have not done any actual reading on it) this could be as trivial as removing it from the current archive, modifying it in-memory, then saving it on the end and updating the index appropriately. But if static-filez
goes to a format where the files are going to be more interleaved, that will be more difficult. (It sounds like that's not going to happen, but it's worth noting.)
A quick way to "support" this is to just append the overwritten files and have the index point at the last version only.
On Sat, 8 Dec 2018, 17:44 QuietMisdreavus, notifications@github.com wrote:
It's worth noting that rustdoc doesn't just want to append to an archive, but also to update files that already exist in the archive...
For context: When Cargo runs a cargo doc command, it invokes rustdoc multiple times on the same output directory, once for each dependency. This allows it to update a handful of shared files - the search index, the new source files index, the shared CSS/JS/font resources - so that the whole dependency tree can act like a single unit. The important piece here is that we need to be able to read in the existing search index (for example), add in the records for the crate being documented, and save it back into the archive.
If i understand the current format correctly (note: have not done any actual reading on it) this could be as trivial as removing it from the current archive, modifying it in-memory, then saving it on the end and updating the index appropriately. But if static-filez goes to a format where the files are going to be more interleaved, that will be more difficult. (It sounds like that's not going to happen, but it's worth noting.)
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/killercup/static-filez/issues/9#issuecomment-445472284, or mute the thread https://github.com/notifications/unsubscribe-auth/AABOXzHHhbzJ2XrK9zJlnIP2VZG5BhAQks5u2-xYgaJpZM4ZJsm- .
Why not ZIP? .tgz
is a pretty poor format for random access, and would probably require an external index.
Does a zip archive allow us to get files out of it as individual gzip streams so we can send them without extracting and re-compressing?
The format itself should allow you to get a deflate
d stream directly out of the archive. You can test this with zip foo.zip foo.txt
and zlib-flate -compress < foo.txt > foo.d
, then by looking at both with a hex editor. foo.zip
will have an extra header and footer, but the compressed data is identical. I don't know if the zip
crate allows that, but it sounds like useful functionality that could reasonably be added to it.
Wrt. browser support, both Firefox and Chrome send Accept-Encoding: gzip, deflate
.
In any case, I don't expect a documentation browser to get thousands of requests per second.
Another thing to consider is that if you're just browsing the docs on your computer, you might as send the files to the browser without compression. And if you want to host your crate's documentation somewhere, static file hosting is probably more accessible than a VPS or something that can run code.
I'm not sure what other use cases you're thinking of. Being able to serve compressed content might ultimately be a nice feature, but wouldn't really matter.
Interesting. My main concern with this crate is making a very efficient way to store and serve compressed data, and while the motivation is the use with rustdoc ideally it doesn't end there. So, when we choose a new archive format I wouldn't want it to have worse performance than the ad-hoc solution we have right now; it should only add compatibility -- either with existing applications or future versions/features of this/rustdoc.
On Wed, 13 Feb 2019, 18:54 Laurențiu Nicola, notifications@github.com wrote:
Another thing to consider is that if you're just browsing the docs on your computer, you might as send the files to the browser without compression. And if you want to host your crate's documentation somewhere, static file hosting is probably more accessible than a VPS or something that can run code.
I'm not sure what other use cases you're thinking of. Being able to serve compressed content might ultimately be a nice feature, but wouldn't really matter.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/killercup/static-filez/issues/9#issuecomment-463299535, or mute the thread https://github.com/notifications/unsubscribe-auth/AABOX8wnfBZfb8671zsHscoIks52rgOMks5vNFFRgaJpZM4ZJsm- .
Fair enough, there's nothing bad in wanting it to be as fast as possible.
It would be nice to support alternative compression formats, brotli/zstd would both be useful as they compress html better than gzip. Maybe the index could record a global or per-file format, and maybe even support multiple formats to allow the server to negotiate which to serve.
Let's define the format of out archives.
Current state
A binary file that is actually just concatenated gzip blobs.
Features:
Prior art
What I learned: GZIP members
While reading the WARC spec I found this interesting section:
I did not know this about gzip! If I'm reading this correctly, it means that we can, in theory use files compatible with tar (or WARC) with the additional requirement that each file is a new GZIP member (so that we can continue to get slices from our index file that point to valid gzip files we can serve).
Options
cc @QuietMisdreavus