pharmapsychotic / clip-interrogator

Image to prompt with BLIP and CLIP
MIT License
2.57k stars 429 forks source link

Downloads BLIP Checkpoint every time #96

Open Jeffman112 opened 9 months ago

Jeffman112 commented 9 months ago

image right now, every time I run my script, it loads the BLIP checkpoint from the link (which takes a lot of vram and crashes my hosting). How can I make it load from a file? Also I'm assuming that the CLIP model is loading from the /cache folder? (download_cache is set to false)

Eaven21 commented 9 months ago
BLIP_MODELS = {
    'base': 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_caption_capfilt_large.pth',
    'large': 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_large_caption.pth'
}

class Interrogator():
    def __init__(self, config: Config):
        self.config = config
        self.device = config.device
        self.blip_offloaded = True
        self.clip_offloaded = True

        if config.blip_model is None:
            if not config.quiet:
                print("Loading BLIP model...")
            blip_path = os.path.dirname(inspect.getfile(blip_decoder))
            configs_path = os.path.join(os.path.dirname(blip_path), 'configs')
            med_config = os.path.join(configs_path, 'med_config.json')
            blip_model = blip_decoder(
                pretrained=BLIP_MODELS[config.blip_model_type],
                image_size=config.blip_image_eval_size, 
                vit=config.blip_model_type, 
                med_config=med_config
            )

In the clip_interrogator.py file, the initialization path for the BLIP model is fixed, which means that the model needs to be redownloaded each time. One possible solution is to manually modify the corresponding model path to avoid redownloading.

zhenhua22 commented 8 months ago

in the latest code,There are no BLIP_MODELS = { 'base': 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_caption_capfilt_large.pth', 'large': 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_large_caption.pth' }