Model type inference - Githubissues

jataware / domain-model-examiner

The goal of this process is to perform machine reading over the model codebase in order to automatically extract key metadata.

MIT License

1 stars 0 forks source link

Model type inference #11

Open brandomr opened 3 years ago

brandomr commented 3 years ago

There are many types of models: physics based models, ODEs, machine learning models, deep learning, etc.

Can we analyze the model imports and how they are used to determine the type of model?

For example, a model that imports pytorch is likely doing some deep learning.

A model using odeint is probably doing something with ODEs.

GoogleSheets commented 3 years ago

Do you see this inferring a single identified model type, or would it be more useful to categorize imports into several bins e.g. ODE, Fourier Transforms, Linear Algebra, to summarize more specifically what the model does?

Also for example, instead of just deep learning for the pytorch libraries, specify audio, text, image/video libraries.

single classfication vs. multiple classifiers

brandomr commented 3 years ago

I think a model could be assigned multiple categories. Subcategories like "multi-class classifier" would make sense as well, but in the World Modelers context I think we are more likely to see physics/theory based models than ML ones. I'm less sure how to detect and categorize those.

Here's an example of a hydrology model for context: https://github.com/peckhams/topoflow36

Here are the largely ML based kimetrica models: https://gitlab.com/kimetrica/darpa/darpa/-/tree/master/models

Here is another hydrology model: https://github.com/PSUmodeling/MM-PIHM

Would be interesting to see what Flee is...

GoogleSheets commented 3 years ago

Just pushed update using data_files/ model-type-libraries.json with library/discipline classifications.

Python only at this point.

Tested on:

FabFlee -- Simulation / Uncertainty Quantification
Topoflow36:
-- Geospatial Data / Terrain Models -- Signal Processing -- Simulation / Uncertainty Quantification
Pythia -- Geospatial Data / Terrain Models

brandomr commented 3 years ago

Nice! Any luck finding anything out there that would have a bigger/better mapping? Or do you think we're going to need to expand on this ourselves?

GoogleSheets commented 3 years ago

I think we will need to expand on this ourselves. Searching by discipline name e.g. Network Anaylsis and "python library" usually yields a list; but, I haven't found a classification scheme anywhere.

Implemented some R packages / libraries in latest push. Still need to improve scraping for R libraries/packages, and also for Python (still identifying local imports).