openclimatefix / skillful_nowcasting

Implementation of DeepMind's Deep Generative Model of Radar (DGMR) https://arxiv.org/abs/2104.00954
MIT License
223 stars 59 forks source link

Gradient Checkpointing, Improved Logging, and Data Pipeline Updates #77

Closed rutkovskii closed 2 weeks ago

rutkovskii commented 2 weeks ago

Pull Request

Description

This pull request introduces several enhancements and fixes to the DGMR project, focusing on optimization, logging, and data processing. The key updates include:

  1. Memory Optimization:

    • Replaced direct calls to self.forward with torch.utils.checkpoint.checkpoint to enable gradient checkpointing and reduce memory consumption during training. Added by colleague – @xuzhe951024
  2. Improved Logging:

    • Removed depricated logger checks in run.py and restructured logger initialization for simplicity to prevent initialization of multiple loggers in multigpu environemnt..
  3. Data Loading Enhancements:

    • Updated TFDataset initialization to include trust_remote_code for compatibility with remote dataset loading.
    • Added configurable batch size, enabling dynamic adjustments during training.
  4. Code Cleanup:

    • Consolidated the __main__ block for better readability and modularity.
    • Added default values for batch size and streamlined DataLoader creation.
  5. Dependencies:

    • Added wandb, datasets, and tensorflow to requirements.txt to support new functionalities.

Fixes # (Include the relevant issue ID if applicable)

How Has This Been Tested?

The changes were tested using the following methods:

Steps to reproduce:

  1. Set up the environment using the updated requirements.txt.
  2. Run the run.py script with the default configuration.
  3. Monitor Wandb logs for training metrics and validate output consistency.

Have you plotted any changes?

Checklist:

rutkovskii commented 2 weeks ago

@jacobbieker Hi Jacob, Here is the comment on this PR. https://github.com/openclimatefix/skillful_nowcasting/issues/59#issuecomment-2486632896

rutkovskii commented 2 weeks ago

@jacobbieker Glad to help! I believe only you can merge it into the main branch from here.

rutkovskii commented 1 week ago

@jacobbieker would it be possible to add me to the list of contributors?

I am also looking to cite this repository in my thesis, and additing the CITATION.cff file could be useful for others who would be citing your work in the future. https://citation-file-format.github.io/ https://citation-file-format.github.io/cff-initializer-javascript/#/

jacobbieker commented 1 week ago

@jacobbieker would it be possible to add me to the list of contributors?

I am also looking to cite this repository in my thesis, and additing the CITATION.cff file could be useful for others who would be citing your work in the future. https://citation-file-format.github.io/ https://citation-file-format.github.io/cff-initializer-javascript/#/

Yes, of course! The comment above should trigger the bot. I've also added a CITATION.cff file now too, so hopefully that helps!

jacobbieker commented 1 week ago

@all-contributors please add @rutkovskii for code

allcontributors[bot] commented 1 week ago

@jacobbieker

I've put up a pull request to add @rutkovskii! :tada:

rutkovskii commented 1 week ago

Thank you very much!