Open amatissart opened 1 year ago
@glerzing Any opinion on this? I know you spent some time experimenting on the algorithms. Do you have specific use-cases that we should take into account when designing this? And by any chance, is that something you would be interested in implementing in the coming weeks/months?
I'm not sure what to propose, and I need to save time for other occupations, so I will not implement this.
oops, accidentally clicked on "close with comment"
I'm not sure what to propose, and I need to save time for other occupations, so I will not implement this.
Ok no problem. We will talk with the team about how to prioritize this.
Have we considered making a Mehestan or Tournesol Python library, that can be downloaded with pip ? This looks like quite a lot of work so I'm really not sure this is a good idea, but that would make things easy to use in a Notebook.
Current situation
Mehestan implementation can be found in the folder
backend/ml
, which is a Django app and is part of the backend deployment.Although "ml/inputs.py" provides an abstract class
MlInput
to define how to access the input data required by the algorithms, running Mehestan is still coupled to the Django apps, as the implementation relies on the Django models and the PostgreSQL database (e.g to save scores and scaling parameters).This makes iterating on the algorithm implementation quite cumbersome. For example in #1332, I used a local VM and a custom script in order to schedule several runs with different parameter and export plots.
We'd like to provide a more flexible interface to enable developers and researchers to replicate this kind of experiments. It would no longer depend on the Django context and would not require any local database running. Ideally everything could run from Jupyter Notebooks, with a small set of Python dependencies.
Of course, the existing backend should rely on the same interface, to avoid code duplication as much as possible.
Possible architecture
Usage
or in the Django context