ntcockroft / STarFish

Tool for predicting the protein targets of bioactive small molecules
GNU General Public License v3.0
6 stars 1 forks source link

Rebuilding models #1

Open rajarshi opened 4 years ago

rajarshi commented 4 years ago

Hi, if I wanted to rebuild the models, what would be the sequence of scripts to run?

ntcockroft commented 4 years ago

First thing to run is prep_data.sh to clean/generate the data sets. If you want to train using all the data (not doing the cross-validation) then the combined training data will be in the ./cross_validation/NP/ directory generated after running prep_data.sh.

cd to ./cross_validation/NP/ $SCRPT_DIR should point the the py/ directory.

Generate base classifiers with: python $SCRPT_DIR/train_model.py -X X_train.csv -y y_train.csv -c RF KNN MLP > train_model.log 2> train_model.error

Predict with base classifiers: python $SCRPT_DIR/test_model.py -X X_train.csv X_test.csv -y y_train.csv y_test.csv -c RF KNN MLP LR > test_model.log 2> test_model.error

Train/test the meta-classifier using base classifier predictions (example for KNN_RF stacked model): python $SCRPT_DIR/train_test_metaclassifier.py -P train_KNN_pred.csv test_KNN_pred.csv train_RF_pred.csv test_RF_pred.csv -y y_train.csv > train_test_metaclassifier.log 2> train_test_metaclassifier.error

Base classifier models will be saved as KNN.joblib or MLP.joblib. In the case of RF, there will be a directory of individual models (one for each target) /RFmodels/**RF*.joblib. The meta-classifier trained model will be saved as LRmeta.joblib**.