Closed PGijsbers closed 7 years ago
I'm 👍 on merging this functionality into this repo.
Merged. Thanks @PG-TUe!
Already forgot about this! ^^' No problem :D
Figured we might as well do some cleaning up on the repo since the paper is coming out. :-)
Currently a lot of code is shared between various scripts in the model_code folder. Shared logic includes preprocessing the dataset, evaluating the classifier on a set of metrics and outputting results. Of course, because the code is not centralized, whenever something needs to be added or changed, it needs to be done in many files. This results in mistakes or inconsistencies such as the
parameter_string
having a trailing comma in logistic regression but not in Bernoulli NB.I have created a fork on my github, which has one generalized function for executing the benchmark and formatting the output, called
evaluate_model
. Configurations for each algorithm can be passed to the function, examples of this can be seen in NewBernoulliNB.py and NewLogisticRegression.py.From testing locally on a small dataset, both the old and new way reproduce identical results*. Please let me know if this is of interest to you. If so, I could possibly rework the other algorithms, review my code, and do a pull request.
* Though LR no longer has a trailing comma in the
parameter_string
, andrandom_state
is included in theparameter_string
for LR.