lamalab-org / MatText

Text-based modeling of materials.
https://lamalab-org.github.io/MatText/
MIT License
23 stars 2 forks source link

feat: support classification #95

Open n0w0f opened 1 month ago

n0w0f commented 1 month ago

There is lot of duplications, so as to avoid any breakage at this moment. might have to refactor later.

we could also choose to not merge it. but would be good to review the code.

Summary by Sourcery

Add support for classification tasks by introducing new classes and methods, refactor benchmarking and task execution logic for better modularity, and enhance model fine-tuning and inference processes with abstract base classes. Include new configuration files and scripts for data preparation and model setup.

New Features:

Enhancements:

Documentation:

Chores:

sourcery-ai[bot] commented 1 month ago

Reviewer's Guide by Sourcery

This pull request introduces support for classification tasks in the existing machine learning pipeline. It includes significant changes to the model architecture, benchmarking process, and data handling. The changes are implemented across multiple files, with major updates to the core functionality in src/mattext/models/benchmark.py, src/mattext/main.py, src/mattext/models/finetune.py, src/mattext/models/predict.py, and src/mattext/models/score.py. New configuration files and data preparation scripts have also been added to support the classification tasks.

File-Level Changes

Files Changes
src/mattext/models/benchmark.py
src/mattext/models/predict.py
Introduced abstract base classes (BaseBenchmark, BaseInference) to support both regression and classification tasks
src/mattext/models/benchmark.py
src/mattext/models/predict.py
src/mattext/models/finetune.py
Added new classes for classification tasks (MatbenchmarkClassification, InferenceClassification, FinetuneClassificationModel)
src/mattext/main.py Updated the main execution flow to include classification tasks
src/mattext/models/score.py
src/mattext/models/predict.py
Implemented new metrics and evaluation methods for classification tasks
conf/model/classification_example.yaml
conf/model/formation_energy.yaml
conf/model/llama_8b.yaml
Added new configuration files for various model representations and datasets
revision-scripts/prep_rep.py
revision-scripts/text_rep.py
revision-scripts/prep_json.py
revision-scripts/mp_classification.py
revision-scripts/5fold_split.py
Created new scripts for data preparation and processing
conf/benchmark.yaml
conf/llm_sft.yaml
conf/model/benchmark_example.yaml
Updated existing configuration files to support new tasks and model representations

Tips - Trigger a new Sourcery review by commenting `@sourcery-ai review` on the pull request. - Continue your discussion with Sourcery by replying directly to review comments. - You can change your review settings at any time by accessing your [dashboard](https://app.sourcery.ai): - Enable or disable the Sourcery-generated pull request summary or reviewer's guide; - Change the review language; - You can always [contact us](mailto:support@sourcery.ai) if you have any questions or feedback.
n0w0f commented 4 weeks ago

@sourcery-ai review Can you check if the roc_auc and metric computation is done correctly in score.py ?

kjappelbaum commented 4 weeks ago

As you and Sourcery said, there is a lot of duplication, and I'm unsure if we should merge it in the current form.

kjappelbaum commented 4 weeks ago

@sourcery-ai review Can you check if the roc_auc and metric computation is done correctly in score.py ?

Is there anything particular you have concerns about, @n0w0f ?

n0w0f commented 4 weeks ago

@sourcery-ai review Can you check if the roc_auc and metric computation is done correctly in score.py ?

Is there anything particular you have concerns about, @n0w0f ?

I wanted to jsut confirm if the metrics are correctly computed.

n0w0f commented 4 weeks ago

@kjappelbaum I agree. we need not merge this. I did not want to break the codebase hence the duplications I can clean up after the sprint

kjappelbaum commented 4 weeks ago

i'd also consider moving to something like unsloth for fine-tuning as it is much faster

kjappelbaum commented 4 weeks ago

Another point we had mentioned on Zulip was to add

n0w0f commented 3 weeks ago

@sourcery-ai review

n0w0f commented 3 weeks ago

Refactored to incorporate the comments. can you @sourcery-ai review