Open aivarasbaranauskas opened 2 days ago
@aivarasbaranauskas Hmm, I wasn't expecting any of the default pretrained checkpoints to require pickle, I enabled that flag by default for safety and that TypeError is supposed to catch old versions of torch that don't have the weights only arg, not checkpoints that fail.
Thinking about this, whether to allow the pickle checkpoint or re-write them and replace with a different one on the HF hub... HMM
I ran into this. This was my workaround:
torch.serialization.add_safe_globals([np.dtype, np.dtypes.Float64DType, numpy.core.multiarray.scalar, # noqa
_codecs.encode])
@bryant1410 thanks, obviously not great to need to do that, have you encountered any other checkpoints that break? I might just disable the weights_only=True default and stick with False for now...
I have only tested the following groups of checkpoints: OpenAI's, Apple's DFN5B's, and MetaCLIP's. MetaCLIP checkpoints are the only ones that needed something like this.
I haven't looked into why MetaCLIP needs these. Maybe they can be easily removed (e.g., maybe they depend on NumPy for some arrays while they could just use PyTorch tensors instead). Still, I think it's somewhat reasonable to have these safe globals added as they seem common and safe in principle. Though I wonder if _codecs.encode
could somehow exploit something? Though this seems safer than weights_only=False
.
We can keep a list of reasonable globals added, such as the ones here, and when issues pop up we can evaluate adding more. What do you think?
@bryant1410 looks like _codecs.encode is on pytorch main now, so sure, I'll add those globals then
Sounds good. Thanks.
BTW, not sure if the globals I added are the most optimal. E.g., maybe there's a single numpy import that could be added instead. Didn't try much. I just added them based on the error messages I was getting.
@bryant1410 @aivarasbaranauskas I merged a fix for this, and added the bigG that was missing. Can someone confirm before I make another release?
Is this enough of a test: https://colab.research.google.com/drive/1oHIkYiEGQIt8PNQa4u4b_IzbNc5n_b9O?usp=sharing (it worked there)?
I'm mostly using a private forked version of this library, which is not kept up-to-date with upstream, so not sure how to test it otherwise (please lmk).
@bryant1410 thanks, I did some basic validation tests of my own too, just wanted another confirm
Thanks for quick fix. It works now when using numpy v1.*
, but still fails with numpy v2.*
. One of the types that should be whitelisted was moved to different namespace in numpy v2
:
DeprecationWarning: numpy.core is deprecated and has been renamed to numpy._core. The numpy._core namespace contains private NumPy internals and its use is discouraged, as NumPy internals can change without warning in any release. In practice, most real-world usage of numpy.core is to access functionality in the public NumPy API. If that is the case, use the public NumPy API. If not, you are using NumPy internals. If you would still like to access an internal attribute, use numpy._core.multiarray.
Have tried to add numpy._core.multiarray.scalar
to torch serialization safe_globals
, but that did not work either.
oh dammit numpy, yeah the numpy 2.0 breaks that idea :( I'm not sure it's possible to work around as pickle needs the matching namespace afaik.
So, I guess I do need to re-write some checkpoints and point to new locations if I want to keep the load safe.
Hello. I am getting an exception
pickle.UnpicklingError
when loadingViT-L-14-quickgelu
model (metaclip_fullcc
pretrained) withopen_clip
version 2.27.0+.Code that's throwing the exception:
The exception:
It seems the exception should be caught here: https://github.com/mlfoundations/open_clip/blob/185071e086e5950ea6ca6c25fe393d5d906aeefa/src/open_clip/factory.py#L138-L141 But the
pickle.UnpicklingError
does not inherit/extendTypeError
.