schovi / baked_file_system

Virtual File System for Crystal language. Embedding your assets into final binary.
MIT License
177 stars 18 forks source link

File checksum for etags #16

Closed lipanski closed 6 years ago

lipanski commented 6 years ago

would be nice to add a (memoized) checksum (e.g. sha1/md5) to every BakedFile object, which could be used for setting the Etag header or for other similar purposes.

also thanks for this shard :1st_place_medal:

straight-shoota commented 6 years ago

I don't see a strong point for adding checksums directly to the BakedFile. If you need it, you can just hash it in user code. A checksum doesn't need to be baked into the binary as it can be easily calculated from the file content. And this feature is not useful for everyone, so it would just waste memory (although not much, but still).

lipanski commented 6 years ago

actually I found https://crystal-lang.org/api/0.24.1/Object.html#hash-instance-method

not 100% sure yet how this applies to IO objects.

straight-shoota commented 6 years ago

Object#hash returns an internal hash representing a specific Crystal object. It's value is not useful for anything except comparing two Crystal objects share the same memory representation.

For a checksum that's usable outside the Crystal runtime like Etags, you should take a look at Digest::MD5.base64digest(baked_file).

lipanski commented 6 years ago

thanks for the explanation :+1:

lipanski commented 6 years ago

And this feature is not useful for everyone, so it would just waste memory

a memoized value would only waste memory for the people actually using it.

straight-shoota commented 6 years ago

It would still mean an additional instance variable. Please, let's keep the interface for BakedFile as slim as possible. If Crystal's plain File type doesn't have a property, BakedFile shouldn't either unless it is essential for the baking behaviour.

lipanski commented 6 years ago

well I can always build around this shard, but the bundling assets for web apps concept is, I think, a very promising use-case for this project, which is why I raised the issue. it makes distributing web apps a much nicer experience. this use case involves checksums (etags), setting the content type and serving gzipped content (which this shard can already do, provided that the #to_slice method never goes private).

if you think that this use case doesn't fit here, feel free to close the ticket and I'll just work around it.

straight-shoota commented 6 years ago

This shard is not only useful for web applications but really anything that needs to embed chunks of data in the executable. It's just one use case. And apart from that, the method for generating an etag is very much application specific. We couldn't just add an etag property and fill it with some data in a way that suits everybody.

If a webapp serves files directly from a (real) file system, it needs to make sure etags are calculated. File doesn't provide that for you.

In theory, File and BakedFile should be pretty much interchangeable so that applications can easily switch (or combine) loading files from real and virtual file systems. If we add advanced features to BakedFile which are not in File (and most certainly won't be) this interchangeability gets broken.

We could discuss adding for example modification or creation date and hopefully find a way to make this accessible in a way similar to File. A modification date would probably help setting up a custom etag generator.

lipanski commented 6 years ago

the creation/modification date is not as accurate as hashing the content. but I do get your point about how this shard should be compatible with File.

for future reference, I ended up with something like this:

require "digest/md5"

module BakedFileSystem
  class BakedFile
    @checksum : String?

    def checksum : String
      @checksum ||= Digest::MD5.base64digest(read).tap { rewind }
    end
  end
end
straight-shoota commented 6 years ago

Generally, I wouldn't necessarily recommend monkey patching but it's obviously an easy implementation.

But you could just use to_slice as input for hash digest. No need to read and rewind.