it is not good to keep the processed data alive like this. you should save it to disk, to an output path specified by the user.
then, in another script, you can load it from that output path (provided by user as an argument). this also allows you to check the output processed csv & analyse / debug it.
this way, everytime you run train, it won't have to reprocess the data from scratch too.
in the same vein, for the predict.py, you should be loading a trained model that has been saved to disk. so, your train script should have saved a model to disk.
https://github.com/sherwin97/ML-Project----Predicting-solubility-/blob/5f50a0998def4d6dfa5ad4b0201555ede0841335/data.py#L69
it is not good to keep the processed data alive like this. you should save it to disk, to an output path specified by the user.
then, in another script, you can load it from that output path (provided by user as an argument). this also allows you to check the output processed csv & analyse / debug it.
this way, everytime you run train, it won't have to reprocess the data from scratch too.
in the same vein, for the
predict.py
, you should be loading a trained model that has been saved to disk. so, your train script should have saved a model to disk.