flatsiedatsie commented 2 months ago

Question

This is probably just because V3 is a work in progress, but I wanted to make sure.

When trying to run Qwen 1.5 - 0.5B it works with the V2 script, but when swapping to V3 I get a 404 not found.

type not specified for model. Using the default dtype: q8.
GET https://huggingface.co/Xenova/Qwen1.5-0.5B-Chat/resolve/main/onnx/model_quantized.onnx 404 (Not Found)

It seems V3 is looking for a file that was renamed 3 months ago. Rename onnx/model_quantized.onnx to onnx/decoder_model_merged_quantized.onnx

I've tried setting dtype to 16 and 32, which does change the URL it tries to get, but those URL's also do not exist :-D

e.g. https://huggingface.co/Xenova/Qwen1.5-0.5B-Chat/resolve/main/onnx/model_fp16.onnx when using dtype: 'fp16'.

Is there something I can do to make V3 find the correct files?

(I'm still trying to find that elusive small model with a large context size to do document summarization with)

Th3G33k commented 1 month ago

745

Hi there 👋 v3 will use the name model instead of decoder_merged_model, as the latter is the result of a legacy conversion process which created multiple versions of the model (w/ and w/o past key value inputs). So, this change isn't needed.

If you want to override the behaviour yourself, you can use the model_file_name option when loading the model.

JohnReginaldShutler commented 1 month ago

Hello! Just a beginner here, could someone help me demonstrate with example code how to override the behaviour yourself using the model_file_name option when loading the model

Th3G33k commented 1 month ago

@JohnReginaldShutler

model: The default filename prefix can be change using the option model_file_name. _quantized.onnx: The default filename suffix cannot be change, and will depend on the precision used. Example:

// using pipeline function
let pipe = await pipeline('text-generation', 'Xenova/Qwen1.5-0.5B-Chat', {model_file_name: 'decoder_model_merged'})
// using AutoModel class
let model = await AutoModel.from_pretrained('Xenova/Qwen1.5-0.5B-Chat', {model_file_name:'decoder_model_merged'})
// will fetch decoder_model_merged_quantized.onnx

xenova / transformers.js

404 when trying Qwen in V3 #723

Question

745