Open OrenLeung opened 6 months ago
Is it possible to use streaming dataset as a distributed key value store?
i have a set of keys (strings like "xyz_123") each that correspond to an numpy array
ideally I can do something like
np_array = dataset["xyz_123"]
but i see with MDSWriter.write that the keys of the dataset are just sequential and i can't change them.
MDSWriter.write
Is there a way to have a custom key for MDSWriter?
MDSWriter
Hi @OrenLeung, what is the size of the dataset and how many unique keys you have in the dataset?
@karan6181 the size is about 1 TB and about 100k unique keys
Is it possible to use streaming dataset as a distributed key value store?
i have a set of keys (strings like "xyz_123") each that correspond to an numpy array
ideally I can do something like
but i see with
MDSWriter.write
that the keys of the dataset are just sequential and i can't change them.Is there a way to have a custom key for
MDSWriter
?