mmagithub / GPCR_LigandClassify

Other
3 stars 3 forks source link

GPCR_LigandClassify

===================================

The following python libraries are required to run the models:

In order to run the script, and in addition to the input file (see below), the following files should exist in the running directory:

NB: The rfc_rdkit_classify_fp.sav & svm_rdkit_classify_fp.sav & mlp_rdkit_classify_fp.sav models are required only if the [--ignore_rf_svm argument] option in the script is set to False (True is the default behaviour) . The models are not deposited in the github repository because of size limits, to get these two models a direct request should be sent to mmahmed@ualberta.ca & kbarakat@ualberta.ca

################################

This is how you can use the models to make novel predictions:

python GPCR_LigandClassify.py --input_file input.csv --output_file output.csv [--n_rows_to_read <INTEGER>] [--mwt_lower_bound <FLOAT>] [--mwt_upper_bound <FLOAT>] [--logp_lower_bound <FLOAT>] [--logp_upper_bound <FLOAT>] [--ignore_rf_svm <True/False>]

Important:

*The input & output file names arguments are mandatory arguments, --n_rows_to_read argument determines how many rows you want to read from the input CSV files (default 9999999999 rows) , the rest are optional with default same as input dataset used for models training.

*The --ignore_rf_svm argument will ignore the RF, the SVM and the MLP models which are pretty large, suitable in case of limited computational resourcses, particularly memory. Default is True (Ignore Randomforests and SVM models.). These models can be requested from the authors: Khaled Barakat (kbarakat@ualberta.ca) and Marawan Ahmed (mmahmed@ualberta.ca).

*Please note that a today date string will be attached to the output file name.

*Please note that the script will only save ligands where all predictions agree.

For the input file, please keep the same format as the attached sample input file (drug bank data file). In case of data coming from different source, with the exception of the SMILES column, other columns may be left blank (not recommended). You can populate the rest of columns with fake data.

*For the models and auxiliary files, please visit the following github repository: https://github.com/mmagithub/GPCR_LigandClassify

Credits