Use internal PyTorch vmap for vrelu3

cgravill commented 3 years ago

@toelli-msft discovered vmap can be accessed in a stable release of pytorch: https://github.com/pytorch/pytorch/pull/58589/files

This change:

imports the internal vmap APIs so they can be experimented with easily
improves the benchmark detection so it can find examples created this way, and makes the filtering more robust generally
comments out the "PyTorch Nice" example as the vmap code isn't able to convert the example yet - added a comment to AB#19587

cgravill commented 3 years ago

toelli-msft commented 3 years ago

PyTorch Nice doesn't seem to appear in the table. Does vrelu3_pytorch_nice need to be a def?

toelli-msft commented 3 years ago

I'd also suggest exporting vmap from one of our ksc modules so that we don't have to update every import site when it does become supported by torch.

cgravill commented 3 years ago

PyTorch Nice doesn't seem to appear in the table. Does vrelu3_pytorch_nice need to be a def?

https://github.com/microsoft/knossos-ksc/blob/4635910a2d22c7eeb5b183bc6f8118524916c9fb/src/bench/conftest.py#L91

Needs adjustment to support the way vrelu3_pytorch_nice is defined - but for emphasis this means the vmap isn't actually being benchmarked (or run) yet.

toelli-msft commented 3 years ago

I'm also confused how the old code vrelu3_pytorch_nice = torch.vmap(relu3_pytorch_nice) works. How come that doesn't crash with a "torch has no member vmap" error?

cgravill commented 3 years ago

I'm also confused how the old code vrelu3_pytorch_nice = torch.vmap(relu3_pytorch_nice) works. How come that doesn't crash with a "torch has no member vmap" error?

It does crash with stable versions, that's why I comment it out (when I run locally). It's not run anywhere on CI.

cgravill commented 3 years ago

Another reason we need to be cautious over this:

UserWarning: torch.vmap is an experimental prototype that is subject to change and/or deletion. Please use at your own risk. > There may be unexpected performance cliffs due to certain operators not being implemented. To see detailed performance > > warnings please use torch._C._debug_only_display_vmap_fallback_warnings(True) before the call tovmap`.

cgravill commented 3 years ago

OK I've improved how candidate benchmarks are discovered so the way vmap'd functions are created are supported.

However, that then generates:

FAILED src/bench/test_run_bench.py::test_inference[vrelu3_pytorch-PyTorch Nice-torch.Size([4])] - RuntimeError: Batching rule not implemented for aten::is_nonzero. We could not generate a fallback.
FAILED src/bench/test_run_bench.py::test_inference[vrelu3_pytorch-PyTorch Nice-torch.Size([16])] - RuntimeError: Batching rule not implemented for aten::is_nonzero. We could not generate a fallback.
FAILED src/bench/test_run_bench.py::test_forward[vrelu3_pytorch-PyTorch Nice-torch.Size([4])] - RuntimeError: Batching rule not implemented for aten::is_nonzero. We could not generate a fallback.
FAILED src/bench/test_run_bench.py::test_forward[vrelu3_pytorch-PyTorch Nice-torch.Size([16])] - RuntimeError: Batching rule not implemented for aten::is_nonzero. We could not generate a fallback.
FAILED src/bench/test_run_bench.py::test_backwards[vrelu3_pytorch-PyTorch Nice-torch.Size([4])] - RuntimeError: Batching rule not implemented for aten::is_nonzero. We could not generate a fallback.
FAILED src/bench/test_run_bench.py::test_backwards[vrelu3_pytorch-PyTorch Nice-torch.Size([16])] - RuntimeError: Batching rule not implemented for aten::is_nonzero. We could not generate a fallback.

confirming the prototype status of vmap. Perhaps we need to engage with https://github.com/pytorch/pytorch/issues/42368 ?

We still might want some of what we're doing here onto master, as we can get further and benchmark more.

toelli-msft commented 3 years ago

It does crash with stable versions, that's why I comment it out (when I run locally). It's not run anywhere on CI.

Ah, that makes sense. And I have 1.9 locally so I don't have to comment it out.

dcrc2 commented 3 years ago

... confirming the prototype status of vmap.

Yes, I don't think we were expecting this example to work yet. Maybe we should just comment it out regardless.

cgravill commented 3 years ago

OK I've adjusted the description. As discussed previously plan was comment out the example using vmap anyway (when we lacked support at all), and agree with suggestion to do now. Still think it's positive to get these related changes in as it shortens the distance for someone to work on "PyTorch Nice" for relu3.

awf commented 3 years ago

Agreed with all - it doesn't currently work, but we want the changes in anyway. It would perhaps be good for the ptnice version to raise a "not implementatable yet" exception, and for that to be reflected in the benchmarks with a line saying so. Then, when we later have other benchmarks crashing for other reasons, we will know we are able to handle that cleanly.

cgravill commented 3 years ago

It would perhaps be good for the ptnice version to raise a "not implementatable yet" exception, and for that to be reflected in the benchmarks with a line saying so. Then, when we later have other benchmarks crashing for other reasons, we will know we are able to handle that cleanly.

I might be putting too much on one feature but AB#19173 needing to determine the number of rounds would be a good place to handle deterministic failures like this.

cgravill commented 3 years ago

Note the overall eventual aim is to enable "PyTorch Nice" AB#19587 which this is a few steps towards.

cgravill commented 3 years ago

@awf (or other reviewers) any more? Otherwise albeit incomplete these improvements/disables would be good to get in.

microsoft / knossos-ksc

Use internal PyTorch vmap for vrelu3 #881