Loss Convergence Benchmark

Progress tracker:

[x] Start with a huggingface Bert model and set up the end to end train/evaluation workflow with basic configs.
[x] Enable TorchDynamo for model training and evaluation function, and ensure the loss going down. Obviously the first version doesn't work, I'm working on fix these bugs to make it work well.
- [x] NumpyVariables doesn't support isinstance(fixed by pytorch/torchdynamo#774).
- [x] torch.ops.profiler._record_function_exit returns a tensor to store underlying RecordFunction, but dynamo can't handle this. Current we put this function into FX graph, but doesn't set its return value as one of the graph output. (https://github.com/pytorch/torchdynamo/pull/867)
- [x] WithExitFunctionVariable can't reconstruct ProfileRecordFunctionVariable correctly.
- [x] ProfileRecordFunctionVariable misses with context on the graph break instruction. I think we should refer how GradModeVariable handling this case.
- [x] Reduce # of graph breaks by wrapping FakeTensor as TensorVariable. https://github.com/pytorch/torchdynamo/pull/931
- [x] https://github.com/pytorch/torchdynamo/issues/979
- [ ] Add logging to monitor the recompile of each model training iteration, a.k.a step.
[ ] Enable other backends and optimizers.
- [x] Make aot_nop and aot_nvfuser work well. https://github.com/pytorch/torchdynamo/issues/953
- [ ] https://github.com/pytorch/torchdynamo/issues/1083
- [x] https://github.com/pytorch/torchdynamo/issues/1085
- [ ] https://github.com/pytorch/pytorch/issues/93654
- [ ] https://github.com/pytorch/torchdynamo/issues/1076
[ ] Enable more models.

pytorch / torchdynamo

Loss Convergence Benchmark #713