From #1317: When client downloads files it does not always need them written into an actual file, often it just wants the content: providing API the returns just bytes would be fine. We should still cache the target to disk but client could avoid reading the file if we provided variants of download_target() and find_cached_target() that returned bytes.
the only complication here might be that we really might want to provide an iterator[bytes] (because there could be a lot of bytes)... If that is possible that would be cool but that might be more complicated
This is also related to #1168 -- if we don't trust the client artifact cache, then we should also not have an API that allows for artifact cache timing attacks
the issue with large artifacts is actually a bit tricky:
we must download the whole artifact before we know it's hash matches the expected value
so the straightforward implementation does download, verification and serialization to disk first, then starts reading the file and returns the iterator... meaning we don't actually avoid the file read at all
it might be possible to return the iterator before verification (and just fail verification before the iterator finishes) ... but this could be tricky
From #1317: When client downloads files it does not always need them written into an actual file, often it just wants the content: providing API the returns just bytes would be fine. We should still cache the target to disk but client could avoid reading the file if we provided variants of
download_target()
andfind_cached_target()
that returned bytes.the only complication here might be that we really might want to provide an
iterator[bytes]
(because there could be a lot of bytes)... If that is possible that would be cool but that might be more complicated