Documentation Request: Table or heuristic for Ortmodel Method to Encoder/Decoder to .onnx File to Task

gidzr commented 11 months ago

Feature request

Hi there

Could you provide either a table (where explicit rules apply - see attached image), or a heuristic, so I can tell which ML models, optimised file types, with which tasks, apply to which inference methods and inference tasks?

The example table below will help to clarify, and isn't necessarily prescriptive, because I may have mixed some concepts.

In case you mention, yes - I'm aware that it's possible to run a pipeline with the wrong model, and an error message will spit out all the accepted architectures/models (roberta, gpt, etc) for a method type. However, a) this is very time-consuming, hit and miss, and b) these 'lists' don't explain the relationships to the underlying architectures and files.. (ie. model_merged, encoder-decoder, encoder only, decoder only, that result from the pytorch, safetensor files.)

For example, will all models exported/optimised for text-generation always be encoder-decoder and always use the ORTSeq2SeqModel method (for illustrative purposes), or will this depend on a combination of the original model architecture and the task applied during optimisation, which may result in one or more usable methods for inference?

It's a massive learning curve for me, but seems it would be relatively straightforward to someone who works with this stuff . It probably just needs to go from peoples' heads into a document.

Thanks muchly! it'll be a massive time saver and help with conceptual understanding.

Motivation

I'm trying to understand how to mix and match the models, optimisations, tasks, and inference methods.. Been trawling HF, ONNX, and general information but cannot find anything like this that exists, and would save a BUNCH of testing trial and error time. (like I've wasted directly and indirectly almost a week of trialling and there's probably very simple rules for this)

Part of the time wasted has been selecting models and running CLI command to optimise/quantize for a task, only to discover I have no idea with ORTModel method to use, as these don't relate to task but model architecture instead (or a combination of both), and brute forcing an understanding with testing and trying to come up with my own heuristics.

Maybe this type of knowledge is assumed? but for newbs like me it's extremely daunting and feels like I may be trying to re-invent the wheel.

Your contribution

(table for illustrative purposes.. the dummy data is wrong.. )

method-task-model-llm-matrix

fxmarty commented 11 months ago

Thank you for the detailed suggestion! I think it can indeed be very helpful to have a summary table.

For now the most comprehensive documentation is here: https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#models. You can find the supported model architectures for each of the ORTModel class.

Now, to explain the terminology around task, we try to follow the one from the Hugging Face Hub. For example, https://huggingface.co/meta-llama/Llama-2-7b-chat-hf has its pipeline_tag be text-generation. This can then be loaded in ORTModelForCausalLM.

What is probably missing from the doc is the task to ORTModel mapping (which follows the one from Transformers library).

gidzr commented 11 months ago

Cool - thanks! Yeh, I realise how difficult it is keeping up with documentation in such a fast moving environment with such rapid changes.

For me, documentation on the logic behind the approach goes a long way because I can infer some of the instructions on my own this way.

No rush obviously, but a good nice to have if you and your team ever get around to it.

Cheers!

fxmarty commented 11 months ago

Thank you, I will leave as open to improve the doc :)

huggingface / optimum