llvm / torch-mlir

The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
Other
1.32k stars 491 forks source link

Fx Importer and the existing dynamo support #3033

Open penguin-wwy opened 6 months ago

penguin-wwy commented 6 months ago

Can Fx Importer be used to replace dynamo support(e.g. import_fx_graph_as_func)?

If so, will Fx Importer retain the interface that accepts GraphModule as input, such as FxImporter.import_graph_module? I understand this to be a reasonable requirement.

penguin-wwy commented 6 months ago

@stellaraccident Please help me to answer my questions

stellaraccident commented 6 months ago

I think we may have missed landing a documentation patch as I remember someone having edited to answer some of this. I can't find it now, and much of the discussion happened on Discourse. We need to update the documentation to basically call out:

penguin-wwy commented 6 months ago
  • The TorchScript based dynamo support (basically everything under the pt1 directory) is deprecated. While there is no active plan to remove it, neither am I aware of anyone fixing it.

I have previously attempted to fix issue #2998 related to pt1 dynamo support, but it seems that there is no interest, or can someone review them? If possible, we should remove or at least mark these interfaces as deprecated as soon as possible.

  • We provide very basic APIs in the torch-mlir project itself for interacting with torch.export: https://github.com/llvm/torch-mlir/blob/main/python/torch_mlir/fx.py . These are basically just examples and meant that integrators can do with them what they want.
  • I have no active plans to remove the FxImporter.import_graph_module for importing stateless FX graphs. This will likely remain the proper way to import from torch.compile with import_program being the way to import from torch.export.

In our project, we rely heavily on the functionality of torch.compile, so I still prefer to retain the FxImporter.import_graph_module and maintain it in the long term. Additionally, it would be helpful to provide a calling method similar to export_and_import in fx.py.

Regarding the e2e testing for Fx Importer, will you reuse the existing pt1 e2e testing facilities?

FYI, I tried to replace import_fx_graph_as_func with import_graph_module in the current dynamo e2e test workflow to test it, and encountered a crash issue with HBC_basic. It might be beneficial if we could add this part of the test to Fx Importer.

stellaraccident commented 6 months ago

The documentation needs more updates for sure. Much of the discussion happened here:

https://discourse.llvm.org/t/torch-mlir-pytorch2-uplift/74000/26

https://discourse.llvm.org/t/rfc-rename-torch-mlir-compile-apis-and-introduce-fx-based-analogs/76646/3

I wanted to more aggressively remove the less supported parts but received pushback. I know there are still folks using the various pieces, but it is a distributed project. I did push to organize the codebase so that the older parts that are not aligned with the pytorch roadmap are in segregated parts of the project. We should probably be more forceful in the documentation to steer people towards the more actively maintained parts.

I inherited much of the project from the former maintainers about 9 months ago and have been doing my best to try to steer it to better align with where PyTorch itself is going while also making a path for my team at AMD to work on the parts that we directly use (mostly the FX layer, linalg lowerings and onnx interop). We maintain most of the torch.compile support that we use in a downstream that leverages the FxImporter (which originated in that downstream and I donated to torch-mlir).

Sorry there's still some messiness. Contributions very welcome.

stellaraccident commented 6 months ago

Regarding the e2e testing for Fx Importer, will you reuse the existing pt1 e2e testing facilities?

Probably not long term, but no one has had time yet to redo testing infra that is still working. I'd like to see more of the fx style tests be lit tests, and we are assembling good test suites downstream that we will want to enable in torch-mlir when ready.

There are a lot of tests were assembling here that we will soon want to hook up to the torch-mlir CI: https://github.com/nod-ai/SHARK-TestSuite

penguin-wwy commented 5 months ago

I have two issues that need to be discussed.

First, is it possible to selectively disable torchdynamo config in the current e2e tests? If there is indeed no one maintaining dynamo support at the moment, then we should not need to continue executing dynamo's e2e tests. Otherwise, every new code addition will trigger xfail, which is unnecessary.

Secondly, regarding the testing issue of the fx importer. I briefly looked at the SHARK-TestSuite and I'm not sure if my understanding is correct. Most of its tests are based on models, and it cannot quickly help me filter out unsupported scenarios. Therefore, perhaps we still need to simply integrate the fx importer into the e2e tests, which would make it easier to reproduce and locate problems, until the SHARK-TestSuite is more complete and merged into the torch-mlir CI.

stellaraccident commented 5 months ago

Some of this was also being discussed on discord last night: https://discord.com/channels/636084430946959380/742573221882364009/1227852761769447474

I think I agree with you about updating the e2e tests to be based on fx importer. That is not the same as me knowing when that will get done or who will do it. I'd love it if someone would take that on. Eventually if no one does, the cost of keeping it like it is will raise high enough that I will have to act to transition it. But I have a lot of other things on my plate at this moment.

stellaraccident commented 5 months ago

I also agree that as a pragmatic first step, we can delete e2e test suites that are adding no value. Again, it's be great if someone could help with this. It is an inconvenience for my team but hasn't gotten bad enough that we've prioritized fixing it (and have been fixing the model level test gap vs trying to improve the op level tests).

My main requirement is that lowerings in tree get tested with real execution of some kind, ideally at the op level. I ultimately don't care how that is done. The e2e test suite is currently meeting that requirement but needs an uplift.

penguin-wwy commented 5 months ago

I have modified a config for FxImporter( #3151 ) based on the original torchdynamo for e2e testing. It's still quite rough at the moment, but I hope to refine it and replace the torchdynamo tests in CI with it. If this aligns with your thoughts?

sjain-stanford commented 5 months ago

Thanks @penguin-wwy for driving this work. Very happy to see we now have a way of testing e2e flows using the TorchDynamo frontend.

penguin-wwy commented 5 months ago

I've found a issue, it seems difficult for FxImporter to execute dynamic shape cases on the e2e-test? We can pass the dynamic_shapes to the export to generate FxGraph with dynamic dims, but you need to use the same export.Dim for the same dim, otherwise, it will throw an error.

The values of s2 = L['rhs'].size()[0] and s1 = L['lhs'].size()[1] must always be equal.