samjmolyneux / eppi-text-classification

Classifying papers by their abstracts and titles.
2 stars 0 forks source link

PyTest Ruff MyPy

Installation

Create a virtual environment.

conda create -n eppi_text python=3.11
conda activate eppi_text

Install.

pip3 install -e .
python3 -m spacy download en_core_web_sm

If you wish to run tests, you will need to install the test dependencies.

pip3 install -e ".[test]" 

Setup

The workbench uses a database to track the hyperparameters and results of the hyperparameter search. To use this feature, you must have a database in an appropriate location.

PC

On a local system, the OptunaHyperparameterOptimisation object will automatically find the optuna.db in this repo and use it.

Azure ML Studio/ Cloud service

On Azure ML Studio, the database must be set more carefully to be on the same storage device as the compute instance. For example, when using notebooks, create a database at /mnt/tmp and set the db_url of OptunaHyperaparmeterOptimisation object appropriately:

cd /mnt/tmp
touch optuna.db

And set db_url appropriately in your script/notebook. In this case:

optimiser = OptunaHyperparameterOptimisation(
    db_url=f"sqlite:////mnt/tmp/optuna.db",
)

Structure

.
├── README.md
├── data
│   └── raw
│       └── debunking_review.tsv
├── eppi_text_classification
│   ├── __init__.py
│   ├── opt.py
│   ├── plotly_confusion.py
│   ├── plotly_roc.py
│   ├── plots.py
│   ├── predict.py
│   ├── save_features_labels.py
│   ├── shap_colors
│   │   ├── __init__.py
│   │   ├── _colorconv.py
│   │   └── _colors.py
│   ├── shap_plotter.py
│   ├── utils.py
│   └── validation.py
├── notebooks
│   ├── lgbm
│   │   └── lgbm_binary.ipynb
│   ├── random_forest
│   │   └── random_forest_binary.ipynb
│   ├── svm
│   │   └── svm_binary.ipynb
│   └── xgboost
│       └── xgboost_binary.ipynb
├── optuna.db
├── pyproject.toml
├── setup.py
├── tests
│   ├── check_install.py
│   └── test_00_smoke.py
└── tox.ini

Known Bugs