Closed ValterFallenius closed 2 years ago
Found a bug today in my code. Above results is trained and tested with only spatial downsampler and temporal encoder, no axial attention... 1 month of work in vain. I'll be back with actual results in a few days.
Too small changes depending on lead time The model is able to learn something but the output image seems to change too little depending on the lead time encoding (one-hot from input layer). Here are some examples of the output from 2 different models, one with 60 leadtimes and one with only 8. The left hand plots show the ground truth precipitation in the prediction zone at different lead times, the right hand side shows P(rain_rate>0.2 mm/h) which means I sum the softmax probabillites of all the 127 bins corresponding to rain>0.2mm/h.
60 lead times (5,10,15... 300 min)
Only 8 leadtimes (15,30,45... 120 minutes): (I changed the cmap to make it clearer that the two plots are not plotting the same thing).
For full 60 lead time network check out: w&b, 60 leads
For 8 lead time network check out: w&b, 8 leads
Should be noted that the 8 lead time network has not yet started overfitting.
I have implemented a sampling quality pass that during makes sure each training sample only samples a lead time when there is at least 5 rain pixels.
I am suspecting the axial attention layer again as a bottleneck. Maybe I'm not using it right. We added a positional embedding so that it would know which pixel was where in the input layer, I was wondering if we should add an embedding for which channel it is looking at. Since the model seems to be forgetting which lead time it is handling the ConvGRU spits out 256x28x28 tensor.
Why is it performing so poorly?