kbase / file_cache_server

Cache server for files that take longer to generate than to download.
MIT License
2 stars 2 forks source link

Reference data storage API proposal #14

Open jayrbolton opened 4 years ago

jayrbolton commented 4 years ago

This codebase could potentially be used for reference data storage, allowing us to keep a standardized, centralized place to upload & download RE import data, SDK app data, etc. I propose adding an endpoint that provides a JSON-RPC 2.0 api for uploading files:

Methods:

The server can gzip files if they have not already been compressed. We can also have a Python client module that takes care of creating tarballs, gzip compression/extraction, proper low-memory streaming, and checking downloaded data against its content hash.

Optionally, we could have a policy to expire files if they haven't been accessed after a certain time period, such as 500 days.

Blake2b hashing should be used.

Users can upload directories if they pack them into a tarball, and the client can help with this. The server can detect if uploaded files are already gzipped using this

xpe commented 4 years ago

Blake2b hashing should be used.

Why do you recommend it? Speed of hashing?

Have you considered BLAKE3?

jayrbolton commented 4 years ago

Yeah, blake2b is a lot faster, I would use blake3, but the python package seems to be in development, and they warn you against using it in production without extra work. Blake2b is builtin to hashlib in py 3.6+ so very stable

xpe commented 4 years ago

A few contextual questions:

jayrbolton commented 4 years ago

Good questions..

jayrbolton commented 4 years ago

In Slack, we tentatively decided to use the blobstore API for this: https://github.com/kbase/blobstore