nan number when evaluating

yxzwang commented 1 year ago

Hello, I tried to train your model via python codes. I tried "python train_ion.py" with your own demo dataset and config, but it turns out that the evaluating result is nan. However the training loss is decreasing. When I examine the evaluation process, the pred_y is full of 0 or -1 or -2, I think it's the reason why it's nan, because in that situation the result list is empty. Could you reexamine the demo to see if it happens and fix the problem? Thank you

gureann commented 1 year ago

Hi @yxzwang ,

I just tested the followings and both two work well

python train_ion.py -c .\demo\ConfigDemo-IonModel-ModelTestWith_U2OS_DIA-Train.json
python pred_ion.py -c .\demo\ConfigDemo-IonModel-ModelTestWith_U2OS_DIA-Test.json

Would you please share the command you used with me, so that I can have a try on that. Thanks

Ronghui

yxzwang commented 1 year ago

Hi @yxzwang ,

I just tested the followings and both two work well

python train_ion.py -c .\demo\ConfigDemo-IonModel-ModelTestWith_U2OS_DIA-Train.json

python pred_ion.py -c .\demo\ConfigDemo-IonModel-ModelTestWith_U2OS_DIA-Test.json

Would you please share the command you used with me, so that I can have a try on that. Thanks

Ronghui

I think the config I used is [ConfigTemplate-Ion_model_train.json] and the traindata is ./Data/IonModel_TestData/20201010-Inten_Train-RPE1_DIA-seed0_811.json. Is this data right? The command is just python train_ion.py, which I think refers the config to [ConfigTemplate-Ion_model_train.json]

gureann commented 1 year ago

Hi @yxzwang ,

The problem was caused by the way to define the config file for script train_ion.py (same for other three similar scripts) This script will first look for a json config file if you can give it explicitly, and fallback to use the json config file defined in the script itself. If both failed, it will use settings stored in config_ion_model.py.

In your case, train_ion.py finally used the settings in config_ion_model.py (because the json config file is blank in script), while this python config file could control both tasks for training and infer, and it's currently in infer mode, which made the pipeline go wrong

You might want to use train_ion.py ConfigTemplate-Ion_model_train.json to do what you want, and use pred_ion.py ConfigTemplate-Ion_model_pred.json after the training is done (don't forget to change path of model weight file)

One more word, you can pass a single json config file like above, and modify TaskPurpose in config_ion_model.py would also work. Meanwhile, it's recommanded to add a -c before json config file so that you can pass other arguments to overwrite the parameters in config file in a batch process

Best, Ronghui

yxzwang commented 1 year ago

I've solved the problem. Thank you for your help! It's very useful.

weizhenFrank / DeepPhospho

nan number when evaluating #6