todo: enable CI to run torchdynamo/tensorrt tests

stas00 commented 2 years ago

@ydshieh, let's talk about instrumenting one of the jobs to run tests for torchdynamo/tensorrt

It's quite a handful of things to build, the instructions are in the OP:

https://github.com/huggingface/transformers/pull/17765

I installed the environment locally, the tests work. I didn't want to get in the way of the PR, so doing it in a separate task.

thank you.

ydshieh commented 2 years ago

Hi, @stas00

Sure! It sounds that you have already some tests written, and we just want to run them with CI, right? Let me know how you would like to proceed (discussion or PR review etc.).

ydshieh commented 2 years ago

Hi @stas00 Let me know if we should proceed :-)

stas00 commented 2 years ago

Yes, please. Please let me know how I can help.

Thank you, @ydshieh!

ydshieh commented 2 years ago

Hi! Basically I don't know what the task is. I didn't go through #17765 in detail, and only knows that PR merged.

Is the (new) goal to install some libraries in docker files, write some specific test methods, or something else.

You mentioned I installed the environment locally, the tests work.. It would be nice if you can let me know which test you mean here, and if you want me to installed the environment for the scheduled CI(s) to run those tests 🙏

Thank you, @stas00 !

stas00 commented 2 years ago

The instructions of what needs to be installed are verbatim in the OP of https://github.com/huggingface/transformers/pull/17765
To test:

pytest tests/trainer/test_trainer.py -k torchdynamo

ydshieh commented 2 years ago

OK, I get it better now. So basically I need to

install what mentioned in To reproduce and set up the environment section in #17765 inside some docker files
have a job to run torchdynamo tests

Do you think it's better to have a new docker image for this (just like deepspeed), or we can just put it in the transformers-all-latest-gpu (the one used for mode/tokenizer/generation tests)? I can try the later first in any case.

stas00 commented 2 years ago

I'd say one docker image for all the extensions.

stas00 commented 2 years ago

At some point it will become stable and will work with just the official released version.

ydshieh commented 2 years ago

@stas00 I have tried to build the image and run the tests.

Here is the run page

It failed with

>           from torchdynamo.optimizations.training import aot_autograd_speedup_strategy
E           ImportError: cannot import name 'aot_autograd_speedup_strategy' from 'torchdynamo.optimizations.training'

I can't find aot_autograd_speedup_strategy in the latest torchdynamo repo.

cc @frank-wei

stas00 commented 2 years ago

Thank you for trying, @ydshieh. It's too bad that the API appears to be unstable :(

Let's wait for @frank-wei to reply

frank-wei commented 2 years ago

torchdynamo repo

Thanks for setting up the testing @stas00 and @ydshieh Looks like @Chillee or @anijain2305 had made some updates there for AOTAutoGrad. Could you elaborate here?

anijain2305 commented 2 years ago

We have cleaned up some Aot Autograd related things in TorchDynamo repo.

https://github.com/huggingface/transformers/blob/4eed2beca0fd8058a1c51684f68599522adf20c9/src/transformers/trainer.py#L652

The above line could be replaced with

return torchdynamo.optimize("aot_nvfuser")

Therefore, we do not need the import anymore on line 645

stas00 commented 2 years ago

We have cleaned up some Aot Autograd related things in TorchDynamo repo.

https://github.com/huggingface/transformers/blob/4eed2beca0fd8058a1c51684f68599522adf20c9/src/transformers/trainer.py#L652

The above line could be replaced with

return torchdynamo.optimize("aot_nvfuser")

could you please make a PR that fixes things.
could you please include the relevant transformers tests in your CI, so that if you break things in the future you'd instantly know and then update the transformers side? Thank you.

e.g. you can see how Deepspeed runs transformers/deepspeed integration tests on their CI https://github.com/microsoft/DeepSpeed/blob/master/.github/workflows/nv-transformers-v100.yml

In your case it'd be cloning the latest transformers repo and running:

pytest tests/trainer/test_trainer.py -k torchdynamo

ydshieh commented 2 years ago

I can change to return torchdynamo.optimize("aot_nvfuser") in my PR directly (to enable CI testing).

ydshieh commented 2 years ago

import issue fixed. But get ResetRequired from ../torchdynamo/torchdynamo/eval_frame.py:101: in __enter__ self.on_enter(). See the full error below.

Maybe I could just torchdynamo.reset() somewhere below # 2. TorchDynamo nvfuser??

________________ TrainerIntegrationTest.test_torchdynamo_memory ________________

self = <tests.trainer.test_trainer.TrainerIntegrationTest testMethod=test_torchdynamo_memory>

    @require_torch_non_multi_gpu
    @require_torchdynamo
    def test_torchdynamo_memory(self):
        # torchdynamo at the moment doesn't support DP/DDP, therefore require a single gpu
        class CustomTrainer(Trainer):
            def compute_loss(self, model, inputs, return_outputs=False):
                x = inputs["x"]
                output = model(x)
                if self.args.n_gpu == 1:
                    return output.mean()
                return output

        class MyModule(torch.nn.Module):
            """Simple module that does aggressive fusion"""

            def __init__(self):
                super().__init__()

            def forward(self, x):
                for _ in range(20):
                    x = torch.nn.functional.relu(x)
                return x

        mod = MyModule()

        # 1. without TorchDynamo (eager baseline)
        a = torch.ones(1024, 1024, device="cuda", requires_grad=True)
        a.grad = None
        trainer = CustomTrainer(model=mod)
        # warmup
        for _ in range(10):
            orig_loss = trainer.training_step(mod, {"x": a})

        # resets
        gc.collect()
        torch.cuda.empty_cache()
        torch.cuda.reset_peak_memory_stats()

        orig_loss = trainer.training_step(mod, {"x": a})
        orig_peak_mem = torch.cuda.max_memory_allocated()
        del trainer

        # 2. TorchDynamo nvfuser
        a = torch.ones(1024, 1024, device="cuda", requires_grad=True)
        a.grad = None
        args = TrainingArguments(output_dir="None", torchdynamo="nvfuser")
        trainer = CustomTrainer(model=mod, args=args)
        # warmup
        for _ in range(10):
>           loss = trainer.training_step(mod, {"x": a})

tests/trainer/test_trainer.py:1893: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
src/transformers/trainer.py:2479: in training_step
    with self.compute_loss_context_manager():
src/transformers/utils/generic.py:291: in __enter__
    self.stack.enter_context(context_manager)
/opt/conda/lib/python3.8/contextlib.py:425: in enter_context
    result = _cm_type.__enter__(cm)
../torchdynamo/torchdynamo/eval_frame.py:101: in __enter__
    self.on_enter()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

    def on_enter():
        global most_recent_backend
        if (
            most_recent_backend is not None
            and most_recent_backend is not compiler_fn
        ):
>           raise ResetRequired()
E           torchdynamo.exc.ResetRequired: 
E           Must call `torchdynamo.reset()` before changing backends.  Detected two calls to
E           `torchdynamo.optimize(...)` with a different backend compiler arguments.

../torchdynamo/torchdynamo/eval_frame.py:[183](https://github.com/huggingface/transformers/runs/7894266049?check_suite_focus=true#step:7:184): ResetRequired

stas00 commented 2 years ago

That's why I asked of @anijain2305 to fix it and make a new PR that actually fixes the tests.

It's not productive to keep going back and forth when we don't know what other things have changed.

anijain2305 commented 2 years ago

Yes, @stas00 I am gonna send a PR to transformers to make the appropriate changes. Apologize for the failures.

Regarding adding transformers in the CI, thats a very good idea. Let me see how much extra time it adds on TorchDynamo side.

stas00 commented 2 years ago

Thank you, @anijain2305!

You can add it as a separate job, so it'd run in parallel with your other jobs and thus not add to the total CI runtime. It should be real fast to finish at least with barely a few basic tests we have right now for torchdynamo. or even tacking it to the existing job - the main overhead will be cloning transformers and installing its prerequisites.

ydshieh commented 1 year ago

Fixed by #19056

huggingface / transformers

todo: enable CI to run torchdynamo/tensorrt tests #18127