mlflow / mlflow

Open source platform for the machine learning lifecycle
https://mlflow.org
Apache License 2.0
18.84k stars 4.25k forks source link

store stdout and stderr #863

Closed eyadsibai closed 5 years ago

eyadsibai commented 5 years ago

Feature Request:

smurching commented 5 years ago

Hi @eyadsibai, it'd be great to hear more about your use case & what exactly you'd want to do with the stored stderr/stdout (i.e. presumably you'd want to view / query it etc?) and how you'd like it to be organized (per run?)

A naive approach to storing stderr/stdout could simply be to redirect them to a temporary file at the start of your program and then log that file as a run artifact once the program finishes (e.g. in Python):

import mlflow
import os
import sys
import tempfile
_, stdout_path = tempfile.mkstemp()
_, stderr_path = tempfile.mkstemp()
print(stdout_path, stderr_path)
sys.stdout = open(stdout_path, 'wb')
sys.stderr = open(stderr_path, 'wb')

with mlflow.start_run():
    try:
        # Run your training code etc here. 
        print("Test stdout")
        sys.stderr.write("Test stderr\n")
    finally:
        sys.stdout.flush()
        sys.stderr.flush()
        mlflow.log_artifact(stdout_path)
        mlflow.log_artifact(stderr_path)
        os.remove(stdout_path)
        os.remove(stderr_path)

However this becomes more complicated if you want to also see stdout/stderr in your terminal window while your code runs, if you want to log stdout/stderr from a cluster of machines, or if you want to use MLflow's nested runs functionality & attribute stderr/stdout to child runs within a parent run - thus it'd be helpful to know what use cases you had in mind

eyadsibai commented 5 years ago

Thank you! I was thinking of logging the output of other libraries too when you can turn on verbosity. For scikit-learn models as an example.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 5 years ago

This issue has not been interacted with for 60 days since it was marked as stale! As such, the issue is now closed. If you have this issue or more information, please re-open the issue!

ksteverson commented 4 years ago

I would also like mlflow to capture the stdout and stderr output. Ideally it would do so while the program is running and display it live on the UI. Or at least not block the stdout/stderr display on the terminal. Sacred + omniboard has this feature and it is really helpful for monitoring programs as they run.

sotte commented 4 years ago

I would also vote for this features. All the other tracking tools (w&b, neptune, comet, sacred, ...) have this feature builtin.

sp7412 commented 4 years ago

Is it possible to capture stdout text from training a tensorflow model and view it in the mlfow web ui as an artifact?

dniku commented 4 years ago

Adding a vote here. It is convenient to see the entire output of a run, including its stdout & stderr, stored in one place.

pu-wei commented 3 years ago

Second this. Would be really helpful if we could see the stdout and stderr.

maffei2443 commented 3 years ago

Adding a vote here. Sometimes an experiment may fail due to an arbitrary reason and in this case it would be extremely helpful to see stdout/stderr.

ZaxR commented 3 years ago

@smurching - can we reopen this issue? There is a lot of activity post close requesting this feature (myself included). Just want to make sure it's on folks' radar.

jdowning10 commented 2 years ago

Keen to have this functionality incorporated into the start_run context manager. Something akin to the Tee operator in Linux with the option to delete the log if the run finished successfully

ghwatson commented 2 years ago

ditto. any mlops tool should have live monitoring capabilities. i'd go much further to say it should at least have feature parity with tensorboard's live monitoring capabilities (ex: the mlflow autolog function should do live plotting).

amie-roten commented 2 years ago

I agree that this feature would be incredibly useful for monitoring the status of and potential source of failure for experiments.

stevehadd commented 2 years ago

I think this would be very helpful for debugging, especially where you might only notice a problem and want to know how long its been happening for, you could look for clues in stdout/stderr in past runs.

One additional helpful options this would be to extend the FileHandler in default Python logging module to be an MLFlowHandler, which writes info written to the python logger as an artifact in the MLFlow run. This would definitely be very useful in extending the general python good practice of logging to ML projects.

Ilykuleshov commented 1 month ago

+1 for this, it's present in every other major ML Experiment tracker. Essential with multi-run tools (such as hydra): e.g. launching a couple dozen runs, going away for a while, coming back to find out that three of them crashed, with the errors lost in your terminal history (tmux doesn't scroll that far...).