Closed thorseraq closed 1 year ago
Let me take a look over this issue!
For the very first look over BlobStorage:
I have the following questions:
params
will carry in get_blob
?put_blob
?BlobStorage
. Most storage services requires list all files under a folder/workspace and than delete it which could be slow.get_blobs_size
, this will need to list all files in workspace to calculate the total size.For question 3 & 4, maybe we can store those data in an extra db instead?
get_hash
util to pack bodystream into vec<u8>
and get length inside put_blob
https://github.com/toeverything/OctoBase/blob/f2b25ceb513eae9db40f644668b4b06e46412ecc/libs/jwst-storage/src/storage/blobs/local_db.rs#L166Yes! Hey please note that there's a upstream PR: https://github.com/toeverything/OctoBase/pull/466/files#diff-8ac295f21eb23dcfea53d111ba03970a4323656dc9483febd0c5eec4b081b435R7, For current db implementation, can reference:
#[sea_orm(table_name = "s3_blobs")]
pub struct Model {
#[sea_orm(primary_key, auto_increment = false)]
pub workspace: String,
#[sea_orm(primary_key, auto_increment = false)]
pub hash: String,
pub length: i64,
pub timestamp: DateTimeWithTimeZone,
#[sea_orm(primary_key, auto_increment = false)]
pub params: String,
}
- can reference here, it's a image query optimization
For s3 storage, we can simply ignore this arg?
- How about using
get_hash
util to pack bodystream intovec<u8>
and get length insideput_blob
That's Ok, we can make it work first.
- use cloudflare R2 as s3 storage, I'll look over if its api support this later.
Both R2 and S3 doesn't support it. OpenDAL implements this by list all files and calling delete multiple objects. We can implement in this way and discuss later.
- Just query the db, I think we can leave this method blank~
Good!
For s3 storage, we can simply ignore this arg?
workspace/hash
is enough for s3 querying, agree to ignore it 😈
Both R2 and S3 doesn't support it. OpenDAL implements this by list all files and calling delete multiple objects. We can implement in this way and discuss later.
OK~ LGTM
I will start this issue after https://github.com/toeverything/OctoBase/pull/466
Hey, there are a few changes since our last discussion:
S3Storage
was renamed to BucketStorage
, this may represent more general key-value like storage medium
A new BucketBlobStorage
trait for Bucket like storage, currently this trait is designed for s3 storage and contains only necessary blob operations, how does this seem to you? Please feel free to make changes
pub trait BucketBlobStorage<E = JwstError> {
async fn get_blob(
&self,
workspace: Option<String>,
id: String,
params: Option<HashMap<String, String>>,
) -> JwstResult<Vec<u8>, E>;
async fn put_blob(
&self,
workspace: Option<String>,
hash: String,
blob: Vec<u8>,
) -> JwstResult<String, E>;
async fn delete_blob(&self, workspace: Option<String>, id: String) -> JwstResult<bool, E>;
async fn delete_workspace(&self, workspace_id: String) -> JwstResult<(), E>;
}