Optionally generate hash of the files

mirage / ocaml-crunch

Convert a filesystem into a static OCaml module

Other

74 stars 21 forks source link

Optionally generate hash of the files #53

Closed tmattio closed 2 years ago

tmattio commented 3 years ago

This is especially useful for web applications so that the ETag does not have to be computed at runtime.

The interface could be similar to read, e.g: val hash: string -> string option

tmattio commented 3 years ago

Related to this, it would similarly useful to compute the size of the embedded files, for inclusion in the Content-Length header. Large files are typically streamed to the user, so it's not possible to provide this header for embedded files at the moment.

aantron commented 3 years ago

@tmattio You may want to make this comment into a separate issue, but AFAIK, when I last looked at the output of crunch, it unconditionally takes large files and concatenates the chunks, at run time, into one huge string, each time the file is accessed. IIRC this is the case even with the Lwt mode. It would be better to have a streaming interface, or at least get a list of the chunks to walk down.

Or I will open a separate issue later, when I have time to confirm (and I do end up confirming what I remember).

aantron commented 3 years ago

Looks like #40 is a variation of this observation/request.

aantron commented 3 years ago

Related to this, it would similarly useful to compute the size of the embedded files, for inclusion in the Content-Length header. Large files are typically streamed to the user, so it's not possible to provide this header for embedded files at the moment.

The header is not necessary if using Transfer-Encoding: chunked, though. I've adopted the strategy of always using TE: Chunked for streams, and it seems to work well. Is there a reason to still set Content-Length? (I don't know enough about this to answer efficiently).

tmattio commented 3 years ago

Is there a reason to still set Content-Length?

IIRC, user experience is better for end-users: browsers like Chrome are able to give an ETA and progression of the download. Downloading a large file and not having this information can be.. frustrating 😄

aantron commented 3 years ago

Thanks. In that case I second the request for knowing file size (which is actually not needed in the current API, because you always get the final string from it — but is needed with #40 or my earlier comment about chunks).

hannesm commented 3 years ago

The streaming interface: I concur this would be useful. The hash and size: indeed this would be useful as well.

What I'm curious about is which interface you use (do you use any?) on top of the generated output. AFAICT, if you call <generated_cunch>.connect () (in the Lwt monad), you receive a handle to a Mirage_kv.RO -- which has a "digest" available.

Though, tbh - maybe to ease usage of crunch outside of MirageOS unikernels (where the mirage utility generates some boilerplate), the generated file should already embed a digest and size information (and some helpers to stream the content) -- digest and size is known at the time crunch is invoked, and not needed to be recomputed on every access.

TL;DR: feel free to propose additions as PR(s), and/or describe your usage.

hannesm commented 2 years ago

done in #60 (optionally generate hash of files).