tensorflow / mlir

"Multi-Level Intermediate Representation" Compiler Infrastructure
1.74k stars 260 forks source link

Extending Python api #152

Open mfojtak opened 5 years ago

mfojtak commented 5 years ago

Is there a plan to extend the python API? Specifically, adding builders for loop dialect ops - loop.for and loop.if Or even better - is there a way to extend the api with generic operation builder. Currently, it is possible to create simple ops using "op" method. But I could not find a way how to create ops for operations that deal with regions - like loop.for or loop.if. I am creating Python to MLIR lowering and I am building the if statement using blocks. It would be better if i could use higher level dialects like loop. Or ideally use any dialect via generic api.

joker-eph commented 5 years ago

I'd be in favor of building proper and layered bindings, the current ones have been assembled to support a specific experiment and don't seem suitable to build upon in my opinion.

As a basic layer, I would start with the part of MLIR that is the less likely to evolve at this point: building a generic Operation, Region, and Block. This would immediately enable you to use any dialect via a generic API, as you mention.

mfojtak commented 5 years ago

The python binding itself is not a problem. The problem is that there is not even cpp high level api. The edsc api looks promising but it is still incomplete. Is there any plan on developing convenient high level api for Region and Operation that could be used from Python or Swift?

joker-eph commented 5 years ago

Is there any plan on developing convenient high level api for Region and Operation that could be used from Python or Swift?

Can you expand how different it is from what I was suggesting above? (start with the APIs for "building a generic Operation, Region, and Block") I may misunderstand what you're looking for.

mfojtak commented 5 years ago

What you're suggesting makes sense. I am just asking if there's a plan to do this. It is not clear from any document or video if MLIR will have a frontend. I need to choose which way to go now. Another alternative for me is to build MLIR IR using string operations (custom python IR printer) and then use simple binding capable of parse/compile/invoke this IR. I tried to reverse engineer MLIR code to be able to write the API you're suggesting myself but I am lost. On the other hand - the current python bindings are working fine for me. I can construct simple ops, blocks api looks complete, the only piece missing is generic api to create "structural" ops like loops or ifs.

ftynse commented 5 years ago

Having a frontend is a bit different from having language bindings to construct an IR, although the current Python binding together with EDSC are vaguely going into language usability direction.

The bindings currently in the repo predate many of the MLIR generalizations, such as regions and functions-as-ops and definitely need an update. A more principled approach along the lines @joker-eph mentions should be the goal. It's discutable whether it should be built on top of EDSC or Builder APIs, I personally find EDSC maps more directly to Python abstractions, but requires additional C++ code since it also predates the same generalizations.

The bindings actually do have support for inserting custom ops into the IR (https://github.com/tensorflow/mlir/blob/master/bindings/python/test/test_py2and3.py#L226) and for creating Blocks (https://github.com/tensorflow/mlir/blob/master/bindings/python/test/test_py2and3.py#L54), but not Regions.

You can look into how functions (https://github.com/tensorflow/mlir/blob/master/bindings/python/pybind.cpp#L244) and loop bodies (https://github.com/tensorflow/mlir/blob/master/bindings/python/pybind.cpp#L300) are exposed through context managers and generalize that for arbitrary regions. I will be happy review.

Maybe @nicolasvasilache has some plans about this code.

nicolasvasilache commented 5 years ago

Hi @mfojtak,

As @ftynse mentions you can use context managers for some of that but the state of the bindings is generally limited. Note that it is relatively simple to build a simple limited embedded DSL in Python that emits MLIR and runs on top of numpy/jax/tf/you favorite framework on top of this.

Historically, EDSCs used to be a set of C++ abstractions (an AST and syntactic sugar) with delayed typing, simple type inference and a C API for the explicit purpose of making MLIR metaprogrammable from any language. However this was done at a time the Builder/Operation API was still in an early stage and there was strong pushback against a separate abstraction for this purpose. Consequently, we stripped EDSCs of the AST and most of the C-API.

I still believe it would be very useful that MLIR be metaprogrammable from C with a standard staged abstraction that could be reused from any language but I am not putting more skin in the game at this time.

This post seems relevant and I will be following that work as it progresses.

In the meantime, if there is anything specific that would be useful using the current EDSCs please let us know.

Thanks!

N

mfojtak commented 5 years ago

Hi @nicolasvasilache,

Thanks for answer.

In the meantime, if there is anything specific that would be useful using the current EDSCs please let us know.

Sure, like I explained in my previous post. Currently there is a method in EDSC which allows to create operation producing single value.

/// Generic create for a named operation producing a single value.
  static ValueHandle create(StringRef name, ArrayRef<ValueHandle> operands,
                            ArrayRef<Type> resultTypes,
                            ArrayRef<NamedAttribute> attributes = {});

However, this method would not let me to create an op which produces no value and accepts regions as arguments like loop.if or loop.for. It would be great if there was more generic create method allowing to emit any op. This could then be reusable from any language.

Cheers.