njucckevin / SeeClick

The model, data and code for the visual GUI Agent SeeClick
Apache License 2.0
139 stars 8 forks source link

How to inference in 4bit? #4

Open DanielProkhorov opened 5 months ago

DanielProkhorov commented 5 months ago

When I load the model in 4bit:

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-VL-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("cckevinn/SeeClick", device_map="auto", trust_remote_code=True, load_in_4bit=True, do_sample=True, temperature=1e-3).eval()
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-VL-Chat", trust_remote_code=True)

I get the following error during inference:

RuntimeError: Input type (torch.cuda.ByteTensor) and weight type (torch.cuda.HalfTensor) should be the same

njucckevin commented 5 months ago

Our fine-tuning is in bf16, so maybe we need to check the Qwen-VL repository for quantization.