Closed stas00 closed 1 year ago
Hi, @stas00
Sure! It sounds that you have already some tests written, and we just want to run them with CI, right? Let me know how you would like to proceed (discussion or PR review etc.).
Hi @stas00 Let me know if we should proceed :-)
Yes, please. Please let me know how I can help.
Thank you, @ydshieh!
Hi! Basically I don't know what the task is. I didn't go through #17765 in detail, and only knows that PR merged.
Is the (new) goal to install some libraries in docker files, write some specific test methods, or something else.
You mentioned I installed the environment locally, the tests work.
. It would be nice if you can let me know which test you mean here, and if you want me to installed the environment
for the scheduled CI(s) to run those tests 🙏
Thank you, @stas00 !
The instructions of what needs to be installed are verbatim in the OP of https://github.com/huggingface/transformers/pull/17765
To test:
pytest tests/trainer/test_trainer.py -k torchdynamo
OK, I get it better now. So basically I need to
To reproduce and set up the environment
section in #17765 inside some docker filestorchdynamo
testsDo you think it's better to have a new docker image for this (just like deepspeed
), or we can just put it in the transformers-all-latest-gpu
(the one used for mode/tokenizer/generation tests)? I can try the later first in any case.
I'd say one docker image for all the extensions.
At some point it will become stable and will work with just the official released version.
@stas00 I have tried to build the image and run the tests.
Here is the run page
It failed with
> from torchdynamo.optimizations.training import aot_autograd_speedup_strategy
E ImportError: cannot import name 'aot_autograd_speedup_strategy' from 'torchdynamo.optimizations.training'
I can't find aot_autograd_speedup_strategy
in the latest torchdynamo repo.
cc @frank-wei
Thank you for trying, @ydshieh. It's too bad that the API appears to be unstable :(
Let's wait for @frank-wei to reply
torchdynamo repo
Thanks for setting up the testing @stas00 and @ydshieh Looks like @Chillee or @anijain2305 had made some updates there for AOTAutoGrad. Could you elaborate here?
We have cleaned up some Aot Autograd related things in TorchDynamo repo.
The above line could be replaced with
return torchdynamo.optimize("aot_nvfuser")
Therefore, we do not need the import anymore on line 645
We have cleaned up some Aot Autograd related things in TorchDynamo repo.
The above line could be replaced with
return torchdynamo.optimize("aot_nvfuser")
could you please make a PR that fixes things.
could you please include the relevant transformers tests in your CI, so that if you break things in the future you'd instantly know and then update the transformers side? Thank you.
e.g. you can see how Deepspeed runs transformers/deepspeed integration tests on their CI https://github.com/microsoft/DeepSpeed/blob/master/.github/workflows/nv-transformers-v100.yml
In your case it'd be cloning the latest transformers
repo and running:
pytest tests/trainer/test_trainer.py -k torchdynamo
I can change to return torchdynamo.optimize("aot_nvfuser")
in my PR directly (to enable CI testing).
import
issue fixed. But get ResetRequired
from ../torchdynamo/torchdynamo/eval_frame.py:101: in __enter__ self.on_enter()
. See the full error below.
Maybe I could just torchdynamo.reset()
somewhere below # 2. TorchDynamo nvfuser
??
________________ TrainerIntegrationTest.test_torchdynamo_memory ________________
self = <tests.trainer.test_trainer.TrainerIntegrationTest testMethod=test_torchdynamo_memory>
@require_torch_non_multi_gpu
@require_torchdynamo
def test_torchdynamo_memory(self):
# torchdynamo at the moment doesn't support DP/DDP, therefore require a single gpu
class CustomTrainer(Trainer):
def compute_loss(self, model, inputs, return_outputs=False):
x = inputs["x"]
output = model(x)
if self.args.n_gpu == 1:
return output.mean()
return output
class MyModule(torch.nn.Module):
"""Simple module that does aggressive fusion"""
def __init__(self):
super().__init__()
def forward(self, x):
for _ in range(20):
x = torch.nn.functional.relu(x)
return x
mod = MyModule()
# 1. without TorchDynamo (eager baseline)
a = torch.ones(1024, 1024, device="cuda", requires_grad=True)
a.grad = None
trainer = CustomTrainer(model=mod)
# warmup
for _ in range(10):
orig_loss = trainer.training_step(mod, {"x": a})
# resets
gc.collect()
torch.cuda.empty_cache()
torch.cuda.reset_peak_memory_stats()
orig_loss = trainer.training_step(mod, {"x": a})
orig_peak_mem = torch.cuda.max_memory_allocated()
del trainer
# 2. TorchDynamo nvfuser
a = torch.ones(1024, 1024, device="cuda", requires_grad=True)
a.grad = None
args = TrainingArguments(output_dir="None", torchdynamo="nvfuser")
trainer = CustomTrainer(model=mod, args=args)
# warmup
for _ in range(10):
> loss = trainer.training_step(mod, {"x": a})
tests/trainer/test_trainer.py:1893:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
src/transformers/trainer.py:2479: in training_step
with self.compute_loss_context_manager():
src/transformers/utils/generic.py:291: in __enter__
self.stack.enter_context(context_manager)
/opt/conda/lib/python3.8/contextlib.py:425: in enter_context
result = _cm_type.__enter__(cm)
../torchdynamo/torchdynamo/eval_frame.py:101: in __enter__
self.on_enter()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
def on_enter():
global most_recent_backend
if (
most_recent_backend is not None
and most_recent_backend is not compiler_fn
):
> raise ResetRequired()
E torchdynamo.exc.ResetRequired:
E Must call `torchdynamo.reset()` before changing backends. Detected two calls to
E `torchdynamo.optimize(...)` with a different backend compiler arguments.
../torchdynamo/torchdynamo/eval_frame.py:[183](https://github.com/huggingface/transformers/runs/7894266049?check_suite_focus=true#step:7:184): ResetRequired
That's why I asked of @anijain2305 to fix it and make a new PR that actually fixes the tests.
It's not productive to keep going back and forth when we don't know what other things have changed.
Yes, @stas00 I am gonna send a PR to transformers to make the appropriate changes. Apologize for the failures.
Regarding adding transformers in the CI, thats a very good idea. Let me see how much extra time it adds on TorchDynamo side.
Thank you, @anijain2305!
You can add it as a separate job, so it'd run in parallel with your other jobs and thus not add to the total CI runtime. It should be real fast to finish at least with barely a few basic tests we have right now for torchdynamo. or even tacking it to the existing job - the main overhead will be cloning transformers
and installing its prerequisites.
Fixed by #19056
@ydshieh, let's talk about instrumenting one of the jobs to run tests for torchdynamo/tensorrt
It's quite a handful of things to build, the instructions are in the OP:
https://github.com/huggingface/transformers/pull/17765
I installed the environment locally, the tests work. I didn't want to get in the way of the PR, so doing it in a separate task.
thank you.