Closed msaroufim closed 2 years ago
Error but continuing on sounds like dynamo successfully fell back to eager
Hmm should we catch those exceptions and log them as warnings then? My first instinct when I saw an error with a stacktrace was to stop the training job - especially stuff like this https://gist.github.com/msaroufim/c74daa1f11d1edf8e592c1229bfc1cdc#file-gistfile1-txt-L7868-L7897 where the error is in between the training progress bars is not great UX
I liked @wconstab idea that we should emit single line warnings in the customer mode. Maybe we can have separate logging level for that. @mlazos might already be thinking about this.
Yeah I liked Will's idea for single line warnings, I can hide the current pages of errors behind a verbose option or filter.
@msaroufim I don't see any errors when running it with main torchdynamo anymore, can you confirm it passes for you?
This one was strange because even though I see
TypeError
andNotImplemented
errors in the logs, the training did not stop, should they be warnings instead?Composer is an interesting training library focused on performance and I believe they have some of the fastest implementations of pytorch algorithms https://www.mosaicml.com/blog/mlperf-2022 so if we solve this I think we can see
dynamo
mentioned in mlperfRepro
pip install mosaicml
Logs
https://gist.github.com/msaroufim/c74daa1f11d1edf8e592c1229bfc1cdc