Information leakage - Githubissues

Nilabhra commented 2 years ago

First of all, thank you for making this repository available. I have learnt a lot through it so far. However, I found something troubling while looking at the code in this line

I would say that computing channel weights from all the available data cause information leakage from the test/validation set to creep into the model. Ideally, the channel weights should be computed from only the training set, which stems from the assumption that the model generalizes well on the test set just by looking at the train set.

mdribeiro commented 2 years ago

Hello! Thank you very much for the interest in our work and sorry for my late response. You are right. Ideally the weights should be calculated only from the train set to avoid the leakage of information from the test set. However, I would guess that the leakage effect wouldn't be very dramatic in this particular example because (1) the amount of information leaking here is very sparse and (2) the mean values in the train/test sets of this toy dataset shouldn't vary very dramatically.

Right now I'm quite busy extending this work to transient flows. Do you feel like correcting this issue and testing if the solution remains the same or at least very close? In that case, it would be great to see a pull request! :)

debda018 commented 2 years ago

Hello @mdribeiro, Thank you for your appreciation on this work. The channel weights are now being computed only for the training set and a improvement in the validation loss was observed and also the MSE for Ux, Uy, P has improved individually than the DeepCFD model. The values of MSE for this updated version were lesser than DeepCFD after 255 epoch, then onwards there were not much significant changes in the values and thus it can be said that the solution has converged . The lowest validation total MSE was recoreded in epoch 996 and the final epoch output is also given below.

Epoch #996
    Train Loss = 605023.943359375
    Train Total MSE = 0.643233682254313
    Train Ux MSE = 0.17887497226281346
    Train Uy MSE = 0.020300695048129244
    Train p MSE = 0.4440580323642614
    Validation Loss = 562058.96875
    Validation Total MSE = 1.8566802849203854
    Validation Ux MSE = 0.6651387117676816
    Validation Uy MSE = 0.19068795624425855
    Validation p MSE = 1.0008536481251151

Final epoch output

Epoch #1000
    Train Loss = 627731.3125
    Train Total MSE = 0.5853153387242086
    Train Ux MSE = 0.18202550557194924
    Train Uy MSE = 0.03909152433406507
    Train p MSE = 0.36419828766115203
    Validation Loss = 587976.474609375
    Validation Total MSE = 2.148713228258036
    Validation Ux MSE = 0.8289568755586268
    Validation Uy MSE = 0.2143103805638976
    Validation p MSE = 1.1054461477166515

mdribeiro / DeepCFD

Information leakage #6