Open strickvl opened 7 months ago
Hi @strickvl I want to work on this task.
By reading your detailed description I found out that safetensors
could be used for Huggingface
, PyTorch
, PyTorch Lightning
and Tensorflow
integrations. And we could change the current method to use safetensors to store the model. Please correct me if I'm wrong here.
Here I could add a new materialize named for eg. SafetensorsMaterializer
similar to CloudPickleMaterializer
.
But I'm struggling with figuring out how can we show users both the option of pickle
and safetensors
, I mean where can we make this change? As I'm new to this repo, please can you guide me a little on this? Thanks :)
Hi @Dev-Khant good question!
I think what would be the best first place to start would be simply to add new materializers that use safetensors. Then we can allow users to specify them as a custom materializer for their chosen outputs. (See here for more details on that).
We can keep the new materializers as part of the standard library, but they just wouldn't be the default. (The alternative would be to have a config option on the materializer itself, but that's a big / complicated feature to implement and I think we shouldn't start there).
So, don't change the existing materializers but add new ones that use safetensors and update the docs so that people know how to use these parallel options. Hope that makes sense!
@strickvl Totally Understood. As you said we will have parallel options for materialized, so correct if me I am wrong, we will have let's say two HFPTModelMaterializer
one would be with the current approach and another one with safetensors.
Correct.
@strickvl can you assign this issue to me? thanks.
@Saedbhati sure go for it! I've assigned it to you. Please keep in mind the conversation in this thread however :-)
I am working on this as well!
@htahir1
I was looking through #2539 and am curious (in terms of a torch materializer) if it would be more efficient to use safetensors.safe_open
and safetensors.save_file
for load and save functionality respectively.
While this approach would require handling single versus multiple tensors slightly differently I feel it would avoid the problem of saving/loading models twice.
Would there be a downside to this approach?
@JasonBodzy thanks for the interest - it's an interesting suggestion! Using safetensors' native functions could potentially help avoid the double-save problem we ran into earlier.
Though we'd need to solve a couple of challenges:
What do you think? If you're keen to explore this approach further, would be great to see how these pieces could come together. Feel free to share more thoughts or suggestions on tackling these requirements :-)
@bcdurak curious about your thoughts here too!
Open Source Contributors Welcomed!
Please comment below if you would like to work on this issue!
Contact Details [Optional]
support@zenml.io
What happened?
ZenML currently uses Python's
pickle
module (viacloudpickle
library) for model serialization and materialization. However, the safetensors library is fast becoming a standard for storing tensors and model weights, offering a reasonable alternative topickle
. Integratingsafetensors
into ZenML would provide users with a more efficient and secure option for model serialization.Task Description
Implement support for using
safetensors
instead ofpickle
for model materialization in ZenML. The task involves the following:safetensors
for model serialization.src/zenml/integrations
) to utilizesafetensors
where appropriate.pickle
-based serialized models.safetensors
option.Expected Outcome
safetensors
, providing a faster and more secure alternative topickle
.pickle
andsafetensors
for model materialization.safetensors
will be seamless, maintaining compatibility with existing ZenML workflows.safetensors
option effectively.Steps to Implement
safetensors
library and its usage for model serialization.safetensors
serialization.src/zenml/integrations
that would benefit fromsafetensors
and update them accordingly.pickle
-based serialized models can still be loaded.safetensors
option and provide examples of its usage.safetensors
serialization in various scenarios.Additional Context
Integrating
safetensors
into ZenML aligns with the project's goal of providing efficient and secure tools for machine learning workflows. By offering an alternative topickle
, ZenML empowers users with more options for model serialization, catering to their specific needs and preferences.Code of Conduct