Open Ulipenitz opened 6 months ago
Hey @Ulipenitz π Neptune does indeed automatically stop the run once the training loop is done. However, we do provide multiple options to log additional metadata to the run once training is over. Here is our Transformers integration guide that lists these options π https://docs.neptune.ai/integrations/transformers/#logging-additional-metadata-after-training
Please let me know if any of these work for you π€
Thanks for the answer @SiddhantSadangi!
This is indeed useful to log metadata like test metrics after training.
My problem though is that I need to set up the python logger again after the training function.
I am training on a remote machine in the cloud & unfortunately capture_stderr=True, capture_stdout=True
only captures neptune specific logs, but I want to have all logs in neptune, including the python logger.
My proposed workaround with calling setup_main_logger
works, but I think it is not a nice solution.
Ah, understood! Yes, this is definitely inconvenient.
I think your workaround does handle this pretty well in the absence of official support for this use case. I'll just suggest using neptune_callback
's get_run()
method to access the run used by the Transformer callback. This will remove the need for storing the run_id
and reinitializing the run.
trainer = Trainer(
...
callbacks=[neptune_callback],
)
logger.info("This will be logged to Neptune")
trainer.train()
logger.info("This won't be logged to Neptune")
run = neptune_callback.get_run(trainer)
neptune_handler = NeptuneHandler(run=run)
logger.addHandler(neptune_handler)
logger.info("This will be logged to Neptune")
Please let me know if this workaround works better for you π
I will also pass this feedback to the product team β
Is your feature request related to a problem? Please describe.
When I use a Huggingface Trainer with a NeptuneCallback, it seems that the Trainer closes the run automatically & thus disconnects it from the python logger. If I want to log anything to Neptune after training, I have to reinitialize the run, which makes the code complex in bigger training pipelines.
Describe the solution you'd like
Would be great if the run persists.
Describe alternatives you've considered
My workaround looks like this:
main.py:
training_function.py: