mozilla-ai / lumigator

Source code for Mozilla.ai's Lumigator platform
https://mozilla-ai.github.io/lumigator/
Apache License 2.0
64 stars 7 forks source link

Expose a list of supported models #381

Closed dpoulopoulos closed 14 hours ago

dpoulopoulos commented 1 week ago

Is your feature request related to a problem? Please describe.

At present, there is a lack of clarity regarding which models Lumigator supports and the rationale behind these choices. Additionally, users may benefit from receiving default configurations for each supported model, such as default values for parameters that Lumigator exposes.

Describe the solution you'd like

It would be highly beneficial if Lumigator provided an API endpoint (via a GET request) that returns a comprehensive list of supported models along with their associated default parameters. The response of such a request could include:

Additionally, a dedicated documentation page should be created to catalog the supported models and include relevant details.

Moreover, clear criteria and documentation will ensure a smooth process for expanding Lumigator’s model support and make it easier for developers to contribute to the codebase. How do we decide which model to support? How do we decide what the default parameter values should be?

Also, where does this list live? Is it static in code? A configuration file (e.g., a YAML file) that we read during runtime? In a database?

Describe alternatives you've considered

At this time, I cannot identify a feasible alternative to this approach beyond the provision of detailed documentation and an accessible API endpoint.

aittalam commented 1 week ago

Thanks for opening this issue! To provide some context, Lumigator started as a single-job (evaluation), single-task (summarization) tool and we wanted to be opinionated about the models we would have suggested for this specific task. You can find the rationale behind the choice of the models here and the corresponding mapping between the chosen models and the corresponding configuration profiles here.

This mapping is, de facto, our current list, which does not mean we are not able to support other models, but rather that we have spent time testing and configuring just some of them. I think it makes sense to say that we "suggest" (are opinionated about) some models for a given task, but support more and we could reflect this in the UI / documentation by welcoming experimentation with models different from the ones we suggest. For this reason, I like the idea of storing model information in a config file, so that ppl can easily see what is available without the need to hit the API and edit the file if they want to add new models.

Re: default parameters, my 2 cents is that as a starting point we could see what is generally avaiable by default e.g. on HF or APIs. To really get a fair comparison, though, we should take a reference dataset for a given task, find for each model we want to suggest the parameters with which it performs best, and then provide those as defaults.