wassname / attentive-neural-processes

implementing "recurrent attentive neural processes" to forecast power usage (w. LSTM baseline, MCDropout)
Apache License 2.0
88 stars 23 forks source link

No TensorBoard logs from smartmeters-ANP-RNN[-mcdropout].ipynb #1

Closed christabella closed 4 years ago

christabella commented 4 years ago

Thanks for the awesome repo! I ran the notebooks smartmeters-ANP-RNN.ipynb and smartmeters-ANP-RNN-mcdropout.ipynb which instructed to run tensorboard --logdir ${MODEL_DIR} but there were no records found.

I tried replacing DictLogger with the vanilla TensorBoardLogger but this didn't change anything I could see. There were no output .tfevents in MODEL_DIR (only a model checkpoint). image

Still anp-rnn_1d_regression.ipynb logged to TensorBoard fine, using SummaryWriter directly: image


Although not a bug, I was also wondering why the training looked unstable from these plots :) The ANN-RNP paper reported pretty stable convergence: image

Thanks a lot for your time! I'll report back here if I find anything new.

christabella commented 4 years ago

Could it have something to do with not calliing TensorBoardLogger.log_hyperparams()? Although there is a comment # We will do this manually with final metrics in dict_logger.py, I don't see any writer..add_summary() calls in the notebooks.

https://github.com/3springs/attentive-neural-processes/blob/e2e96cfe857561e5b3d53d24927c9dbaf655ae42/src/dict_logger.py#L15

Perhaps I could try using TensorBoardLogger directly and replacing hyperparameter tuning with Optuna, with Guild AI which does not require any code modifications or a SQL database, in a separate fork, if you think that could be a good idea? I have had a good experience with Guild AI for experiment management and hyperparameter tuning, although PyTorch Lightning also looks excellent and I have a feeling they might play well together.

wassname commented 4 years ago

No problem. By the way, I haven't commented, documented, tidied this repo as much as I might like, so if you have things like that to add, go ahead with a PR. That kind of stuff is best coming from a code review anyway.

Although not a bug, I was also wondering why the training looked unstable from these plots :) The ANN-RNP paper reported pretty stable convergence:

I'm not sure. Could be batch size, but my guess is because this is a real-world problem that is a bit too complex for the network. In the ANP-RNN paper, I think it was an easier synthetic problem. Also they are using NLL loss for a classification problem, I'm using negative log liklihood for a regression problem. They also may be smoothing.

tensorboard logs

That's weird I have the directories there. Just to confirm you don't have the model directory appearing? e.g.

I do have that. On my computer.

If you do have it, note that you have to start tensorflow with logdir pointing to a parent directory.

wassname commented 4 years ago

I haven't tried guildai, but this is the first time trying optuna, so it's really just a random choice I made. It is a little clunky. Feel free to change it, and if it comes out nicer PR it upstream.

christabella commented 4 years ago

Thank you for the reply! Btw was this a typo?

Otherwise, it could be

Indeed, my files are not appearing:

- optuna_result/anp-rnn2/anp-rnn/version_2/
 - chk/
- optuna_result/anp-rnn-mcdropout/anp-rnn-mcdropout/version_31/
  - chk/

But thanks a lot for verifying that it works for you! I will look into this. Will PR it upstream if I do anything that works nicely.

By the way, you showed in your MC dropout notebook that MC dropout improves the val loss (cell 40), but when I re-ran it (cell 27) the loss was worse than without dropout: image

image

Also I noticed you were also trying out test_tube for HyperOptArgParser and was curious why you decided to switch to Optuna, or if it's just a random choice too?

wassname commented 4 years ago

Otherwise, it could be

Sorry I forgot to finish that. Sometimes I find the tensorboard delays caching, and does it in chunks.In that case calling flush sometimes helps.

test_tube

I was originally doing it using pytorch lightning guide, which uses HyperOptArgParser, then I tried optuna instead. I was mainly just trying them out to see which I liked.

Hmm, that's very interesting that you got different results with MCDropout. So I guess our results are not significant and we would need to run 5 random seeds or something to get a significant results. I'll update the readme.

I gave a talk on this work recently and one of the major feedback was: it would be better if the val/test set was even more different from the train set, that way we see how the models extrapolate in seriously out of sample data. Perhaps this could be done by using the start of the timeseries as the data which is quite differen't.

I imagine the MCDropout comparison would be better and more pronounced if ran on that data.

In the weekend I'll try deleting all my logs and running again, to make extra sure they are always appearing for me (this they may be old logs or something).

Thanks for the feedback

wassname commented 4 years ago

Hmm I have the same problem is I remove outputs and start from scrarch. It looks like the tensorboard logs are appearing in the lightning logs directory. Which is weird since SummaryWriter doesn't have that as it's log dir.

wassname commented 4 years ago

I rand the MCDropout one 5 times with differen't seed and over more indices, and it was consistently better, weird:

  seed loss loss_mc inds
1 -0.136217 -1.320583 [1436, 3115, 3540, 2502, 3041, 3356, 3068, 148...
10 -0.982833 -1.346300 [3525, 1264, 1046, 3256, 2701, 2319, 1883, 316...
20 -0.826995 -1.417526 [1674, 3151, 1009, 264, 1051, 822, 2621, 1645,...
100 -0.983974 -1.364665 [2840, 3155, 3006, 2139, 1884, 1476, 1654, 133...
christabella commented 4 years ago

Thanks for looking into it! Well it makes sense that the logs are appearing in lightning_logs since that's the default directory for PyTorch Lightning modules, and you've imported LatentModelPL in your notebooks! :)

That's cool, did you upload your talk's materials anywhere?

ConvCNP's experiment methodology documentation is quite detailed but they don't seem to test on time series except for PLASTICC dataset:

Context points are randomly chosen from U(1, total_points) and the remaining points are target points. For testing, a batch size of 1 was used and statistics were computed over 1000 evaluations.

Anyway, I guess it would make sense to choose context as the start of time series, that's like the 1D equivalent of the image completion tasks in the NP papers where only the top half is observed. Worth a shot! 1D extrapolation doesn't seem that great for ConvCNP at least:

image

Did you choose to benchmark on smart meter dataset because it's more practical setting than the PLASTICC dataset? I also found it strange how in ConvCNP they compared log-likelihood of GP and ConvCNP, I don't think it's a meaningful comparison across models?

image

christabella commented 4 years ago

Your results for 5 seeds look good, my notebook run could have been a fluke as well. Just curious, why do you need to use your convert_layers() function which recursively turns on training mode for all children modules, is it insufficient to run model.train() instead?

    # do MCDropout estimation
    model.eval()
    convert_layers(model, torch.nn.modules.dropout.Dropout2d, True)
    model.model._lstm.train()
christabella commented 4 years ago

P.S. Unrelated but I was curious if you also noticed this behaviour of predicted standard deviation exploding: image

Left image: mse_loss returned from LatentModel.forward() seems to be buggy Middle: Like the left image's grey run, these runs have extremely high MSE at a late epoch, I think it is caused by exploding std. However, just saving the best model from earlier epoch is fine because this seems to happen after convergence. Right: However, sometimes this happens very early in the first epoch, before convergence. image

Happy to open another issue or take this elsewhere if you'd prefer!

wassname commented 4 years ago

That's cool, did you upload your talk's materials anywhere?

Yeah that talk slides are here but they may not have context without my words, which were not recorded.

Worth a shot! 1D extrapolation doesn't seem that great for ConvCNP at least:

Yeah I agree.

Did you choose to benchmark on smart meter dataset because it's more practical setting than the PLASTICC dataset

I choose it because it a practical dataset, I didn't look around much though. There may be better ones.

I also found it strange how in ConvCNP they compared log-likelihood of GP and ConvCNP, I don't think it's a meaningful comparison across models?

I would have thought it was? I think of it like how much the probablity distribution overlapps with the real answer. And of course you want as much overlap as possible. It's similar to using p10 and p90 but over the whole distribution. Unless I'm missing something?

why do you need to use your convert_layers()

I think it only turns it on for dropout. That was my aim anyway. I didn't want to turn it on for batchnorm or any other modules.

Unrelated but I was curious if you also noticed this behaviour of predicted standard deviation exploding:

Yeah having the std disapear or explode was the major failure more for me. To avoid it disapering I used the min std, but that has to be tuned for the problem. And to avoid it exploding I usually just used gradient clipping and didn't make the learning rate to large, and that was sufficient.