microsoft / torchgeo

TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
https://www.osgeo.org/projects/torchgeo/
MIT License
2.33k stars 299 forks source link

Support logging figures to multiple loggers #2138

Open robmarkcole opened 1 week ago

robmarkcole commented 1 week ago

Summary

Currently logging of figures (e.g. in Classification and Segmentation trainers) assumes Tensorboard logger which has the method add_figure. However other loggers have similar methods but with different names and are currently not supported. e.g. MLflow uses log_figure. This FR is to support these loggers and document which are supported

Rationale

Many orgs use a service for logging such as mlflow or wandb etc, and it would be nice for these to just work.

Implementation

We would probably want to abstract this for use by all trainers, but adding this method is one soluton

# on the trainer
    def log_figure(self, fig, label, step):
        logger = self.logger.experiment
        if isinstance(logger, SummaryWriter):  # Checking if TensorBoard is used
            logger.add_figure(label, fig, global_step=step)
        elif 'wandb' in str(type(self.logger)):
            wandb.log({label: wandb.Image(fig)})
        elif 'MLFlowLogger' in str(type(self.logger)):
            with tempfile.NamedTemporaryFile(delete=False, suffix=".png") as tmpfile:
                temp_file_path = tmpfile.name
            fig.savefig(temp_file_path)
            mlflow.log_artifact(temp_file_path)
        plt.close(fig)

# in the step
            if fig:
                self.log_figure(fig, f'image/{batch_idx}', self.global_step)

Alternatives

None

Additional information

No response

adamjstewart commented 5 days ago

Another option would be to remove any figure logging from TorchGeo entirely. If users want it, they can subclass the trainer and add it. This feature has historically been a nightmare to maintain/test and has been riddled with bugs. The proposal here is to make it even more complicated.

More positively, can you start a discussion with the Lightning folks to see if there is an easier way to do this automatically? Or a way that can unify the interface somehow? This just seems like too much work to support out of the box.

robmarkcole commented 5 days ago

RE lightning I see a lot of work was done in https://github.com/Lightning-AI/pytorch-lightning/pull/6227 but ultimately they decided this was too complex to maintain, and closed the PR.

Perhaps as you say, logging figures should be left to users to subclass the trainers. I think basic logging of samples on the dataset is a necessity however