(trained with ft-B-train-OpenAI-CLIP-ViT-L-14 then used ft-C-convert-for-SDXL-comfyUI-OpenAI-CLIP and then tried to convert to HF and extract the TE, I am trying to copy for sd3.5L tenc1)
convert_clip_original_pytorch_to_hf.py", line 157, in
convert_clip_checkpoint(args.checkpoint_path, args.pytorch_dump_folder_path, args.config_path)
File "C:\OneTrainer\venv\lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, kwargs)
File "C:\OneTrainer\CLIP-fine-tune\Convert-for-HuggingFace-Spaces-etc\convert_clip_original_pytorch_to_hf.py", line 121, in convert_clip_checkpoint
ptmodel, = load(checkpoint_path, device="cpu", jit=False)
File "C:\OneTrainer\venv\lib\site-packages\clip\clip.py", line 136, in load
state_dict = torch.load(opened_file, map_location="cpu")
File "C:\OneTrainer\venv\lib\site-packages\torch\serialization.py", line 1384, in load
return _legacy_load(
File "C:\OneTrainer\venv\lib\site-packages\torch\serialization.py", line 1628, in _legacy_load
magic_number = pickle_module.load(f, pickle_load_args)
EOFError: Ran out of input
as title says, + also extract TE outputs 1kb file:
update:
after messing around I managed to do the conversion like this, now it is loadable with sd35
import torch
from transformers import CLIPTextModelWithProjection, CLIPTextConfig
# Load the fine-tuned model and extract the state_dict
full_model = torch.load("C:/OneTrainer/CLIP-fine-tune/ft-checkpoints/my-finetune.pt")
state_dict = full_model.state_dict() if hasattr(full_model, "state_dict") else full_model
# Load the configuration and create the model
config = CLIPTextConfig.from_pretrained("C:/train/sd3.5/text_encoder/config.json")
fine_tuned_model = CLIPTextModelWithProjection(config)
# Load the state_dict into the fine-tuned model
fine_tuned_model.load_state_dict(state_dict, strict=False)
# Save only the text encoder part
fine_tuned_model.save_pretrained("C:/OneTrainer/CLIP-fine-tune/ft-checkpoints/")
interestingly the converted, extracted text encoder works with stable diffusion 3.5 (CLIPTextModelWithProjection) but not with flux (changing to CLIPTextModel)
(trained with ft-B-train-OpenAI-CLIP-ViT-L-14 then used ft-C-convert-for-SDXL-comfyUI-OpenAI-CLIP and then tried to convert to HF and extract the TE, I am trying to copy for sd3.5L tenc1)
convert_clip_original_pytorch_to_hf.py", line 157, in
convert_clip_checkpoint(args.checkpoint_path, args.pytorch_dump_folder_path, args.config_path)
File "C:\OneTrainer\venv\lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, kwargs)
File "C:\OneTrainer\CLIP-fine-tune\Convert-for-HuggingFace-Spaces-etc\convert_clip_original_pytorch_to_hf.py", line 121, in convert_clip_checkpoint
ptmodel, = load(checkpoint_path, device="cpu", jit=False)
File "C:\OneTrainer\venv\lib\site-packages\clip\clip.py", line 136, in load
state_dict = torch.load(opened_file, map_location="cpu")
File "C:\OneTrainer\venv\lib\site-packages\torch\serialization.py", line 1384, in load
return _legacy_load(
File "C:\OneTrainer\venv\lib\site-packages\torch\serialization.py", line 1628, in _legacy_load
magic_number = pickle_module.load(f, pickle_load_args)
EOFError: Ran out of input
as title says, + also extract TE outputs 1kb file:
update: after messing around I managed to do the conversion like this, now it is loadable with sd35
interestingly the converted, extracted text encoder works with stable diffusion 3.5 (CLIPTextModelWithProjection) but not with flux (changing to CLIPTextModel)