Closed MaheshRavishankar closed 5 years ago
Looking into adding this right now.
Is the SPIRV dialect only intended to live on the edge, like the LLVM dialect, or are there optimizations that are expected to run on this form?
Can you rely on the symbol table for accesses to global variable defined outside the function? (just like std.call
is also referencing global FuncOp outside)
@River707 : Yes SPIR-V is intended to be like LLVM dialect. So I dont see a conversion from SPIR-V to any other dialect. Indeed the SPIR-V dialect has been designed to be able to serialize it into the SPIR-V binary format easily (most of it is auto-generated from ODS).
@joker-eph : I don't know if that would work. That would have to be named symbols. A function in spir-v module should be able to refer to any value generated from a spv.Constant or an spv.Variable as follows
%0 = spv.Variable : !spv.ptr<f32, StorageBuffer>
func @foo() {
%1 = spv.Load "StorageBuffer" %0 : f32
}
A lot of the SPIR-V spec relies on this. The same mechanism is used to access global invocation IDS, workgroups size, etc. It seems the easiest solution is to allow refering to global variables.
The SPIR-V dialect is designed to be a "proxy" to the SPIR-V binary format. It should stay as the same semantic level and try to be a mechanical 1:1 mapping to SPIR-V binary counterparts if possible. It should be easy to (de)serialize with the binary format.
I can imagine conversions both from and to the SPIR-V dialect. Lowering towards SPIR-V and running on Vulkan needs many optimizations; I'd prefer those to happen at a high-level dialect if possible, but I don't see fundamental reasons that we cannot do SPIR-V specific optimizations in the SPIR-V dialect (FWIW, there are lots of optimations based on SPIR-V binary format in spirv-opt of the SPIRV-Tools project). OTOH, the purpose of the deserialization is to import SPIR-V binary into MLIR system so MLIR can be used to optimize SPIR-V blobs. It means we need to convert SPIR-V to some other dialect to use the optimizations there. So conversions from and to the SPIR-V dialect can both happen, and also optimizations on SPIR-V dialect itself.
Vulkan adopts a binding model that involves descriptors, divided into sets. During runtime, resources can bind to these descriptors designated by a (set number, binding number) pair. SPIR-V is influenced on this front when co-designed with Vulkan. In SPIR-V, to access these resources, global variables are used as handles to them and they are decorated with special annotations (DescriptorSet
and Binding
). This also applies to special system generated values like current local invocation ID (thread ID in CUDA's term). They are represented as global variables with Builtin
decoration. So the idea of using global variables for communicating with the host/runtime is baked into the design of SPIR-V. Also, in SPIR-V constants are placed ahead of any function as a separate section and used implicitly in functions. It would be quite awkward trying to model it otherwise (requiring all functions to explicit capture) in the SPIR-V dialect (which tries to be a "proxy" to the SPIR-V binary format) and it would require a non-trivial (de)serializer, which I'd prefer to avoid. Transformations in MLIR system are much easier to write and maintain than handling the complexity in the (de)serializer.
(Sorry I didn't notice that we require FuncOp
to explicit capture when initially choosing FuncOp
as the representation for SPIR-V functions.)
An example of adding two resources and write it out to another resource (mimicking SPIR-V natively):
spv.module "Logical" "VulkanKHR" {
%c0 = spv.constant 0: i32
%id = spv.Variable builtin("LocalInvocationId") :
!spv.ptr<vector<3xi32>, Input>
%arg0 = spv.Variable bind(0, 0) :
!spv.ptr<!spv.struct<!spv.rtarray<f32, 4>, [0]>, Uniform>
%arg1 = spv.Variable bind(0, 1) :
!spv.ptr<!spv.struct<!spv.rtarray<f32, 4>, [0]>, Uniform>
%ret0 = spv.Variable bind(1, 0) :
!spv.ptr<!spv.struct<!spv.rtarray<f32, 4>, [0]>, Uniform>
func @kernel() {
%0 = spv.Load %id: vector<3xi32>
%1 = spv.CompositeExtract %0, 0 : i32
%2 = spv.AccessChain %arg0, %c0, %1 : !spv.ptr<f32, Uniform>
%3 = spv.Load %2: f32
%4 = spv.AccessChain %arg1, %c0, %1 : !spv.ptr<f32, Uniform>
%5 = spv.Load %4: f32
%6 = spv.FAdd %2, %4: f32
%7 = spv.AccessChain %ret0, %c0, %1 : !spv.ptr<f32, Uniform>
%8 = spv.Store %7, %6
spv.Return
}
spv.EntryPoint "GLCompute" @kernel %arg0, %arg1, %ret0
spv.ExecutionMode @kernel "LocalSize" 32, 1, 1
} attributes {
capabilities: ["Shader"]
}
Thanks Lei for the more detailed explanation. Just wanted to pick your brain about conversion from SPIR-V dialect to other dialects. In my view the overall conversion flow goes something like this : 1) Start with a dialect that has operations on tensors (which is closer to tensorflow) 2) Lower it to a dialect that operates on memrefs/buffers (conversion from SSA-type tensor values to load/store of tensor values with a more explicit view of memory) 3) Lower to a dialect that operates on scalars, using loops to iterate over some space and compute the results in an "element-by-element" fashion. The loops could be parallel and could be partitioned across workgroups (like GPU dialect does)
If this overall structure makes sense, then the SPIR-V dialect is on the bottom of the stack. I find it hard to visualize a conversion that moves from SPIR-V dialect to something higher in the stack. We could maybe have conversion that translate on "scalar" dialect to another (SPIR-V to NVVM say).
In terms of common optimizations, maybe some optimizations (like constant folding) can span this stack, but I think we might have some optimization passes that are SPIR-V specific, or that work on such "scalar" dialects.
This is just my understanding, but would be great if you can provide more insight (or let me know if I am off somewhere).
Agreed that going from a high-level dialect with tensor semantics and lower it progressively towards SPIR-V is the use case we want to enable right now for accelerating TF models with Vulkan. And in this path, SPIR-V is actually the last step given that it will be consumed directly by a Vulkan driver. But I also want to leave the door open if someone finds SPIR-V dialect useful for their use case (be it ML or graphics) and decides to explore a different path. MLIR should be multi-level anyway. :)
Going from SPIR-V to some high-level dialects like TF is hard. You are correct that what I meant in the above by "going from SPIR-V dialect" to others is actually more on converting SPIR-V to similar level or even lower-level. For example, we can convert SPIR-V into LLVM and use the optimizations there. In reality I know Vulkan drivers are doing this internally (e.g., AMDVLK) so if we have this path it might be useful to leverage MLIR in driver compilers so we have a unified compiler story. (Note that LLVM is awesome but its passes are not designed specifically for GPUs and I'd expect we will have more GPU-centric passes in MLIR at similar level like SPIR-V/CUDA in GPU dialect or whatever utility dialects.)
So for SPIR-V specific optimizations, examples include turning specialization constants into normal ones, removing duplicated/implied capabilities, remove unused resource varables, etc. These are all working on SPIR-V specific concepts, so it should naturally be SPIR-V specific passes.
@joker-eph : I don't know if that would work. That would have to be named symbols
Yes it would in MLIR. But I don't know why it would be a problem?
A function in spir-v module should be able to refer to any value generated from a spv.Constant or an spv.Variable as follows
%0 = spv.Variable : !spv.ptr<f32, StorageBuffer> func @foo() { %1 = spv.Load "StorageBuffer" %0 : f32 }
You are making an implementation choice in MLIR. It seems equally isomorphic to the input SPIR-V binary to represent it in MLIR like this:
spv.Variable @spv.variable.0 : !spv.ptr<f32, StorageBuffer>
func @foo() {
%1 = spv.Load "StorageBuffer" @spv.variable.0 : f32
}
. It would be quite awkward trying to model it otherwise (requiring all functions to explicit capture) in the SPIR-V dialect (which tries to be a "proxy" to the SPIR-V binary format)
I'd like to tame a bit the "proxy to SPIR-V binary format" part: the IR is meant to be convenient to manipulate, modify, etc. first. We obviously want it "easy" to round-trip to the binary format as export/import but that does not mean your have to be 1-1 on every single aspects: IMO we shouldn't lose the focus on a compiler IR.
Yep, certainly agree here. Actually I listed similar thinking in the design principles: https://github.com/tensorflow/mlir/blob/master/g3doc/Dialects/SPIR-V.md#design-principles. We've decided to use a single spv.constant
op to represent multiple SPIR-V constant instructions, folding OpDecorate
into the op it is decorating using attributes, etc. Those are all deviations from the SPIR-V binary format trying to make transformation easier. :) So I guess it's a case-by-case situation here. :)
@joker-eph : Fair point. I am basing my reasoning on the current implementation of the SPIR-V dialect. The alternative you suggest would also work if spv.Variables were implemented that way. I dont have any preference one way or the other. The current implementation though has the advantage of making the serialization generation very straight-forward. As Lei mentioned, its a trade-off between easy serialization and representation.
Could I step back and ask what issues you forsee with having a separate function-like operation in SPIR-V dialect which allows capturing variables? That might help decide if this is the right choice
You wouldn't be able to have function passes because they wouldn't be isolated construct anymore. The pass-manager relies on operating on non-capturing operations to be able to multi-thread the compiler.
Another point of comparison is the representation of TensorFlow dialect. There is the same design principle about having a representation isomorphic to TensorFlow graphs with a "trivial" conversion (import/export). Yet we are not 1-1 and for instance TensorFlow has different operations for "If" and "StatelessIf" (similarly for "While"/"StatelessWhile") but we map this to a single operation in MLIR with an attribute: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/mlir/tensorflow/translate/import_graphdef.cc#L1185-L1200
You wouldn't be able to have function passes because they wouldn't be isolated construct anymore. The pass-manager relies on operating on non-capturing operations to be able to multi-thread the compiler.
Ah, these are good points. FunctionPass
are now tightly integrated with FuncOp
, but I am hoping that we can eventually relax this given that now both MoudleOp
and FuncOp
are general ops. So it suggests we should have a general OpPass
or something to go with the scheme and it would be quite handy for dialects wanting their own version of module op and function op.
For multi-threading compilation at the function level, that's true. But I wonder whether we can still partially have the benefits of multi-threading (assuming we have a more general OpPass
or something there). So each function can declare whether they implicitly capture or not via maybe an attribute and the pass can then rely on it to decide whether to invoke parallelly. Another module pass can then compute such attributes for all the functions in a module, if willing to pay the price for the benefits of multi-threading compilation.
For SPIR-V, the typical flow I can see right now, if lowering from std, is basically picking a std.func as the entry point and then turn that function into a module. (Similarly for kernel launching in GPU dialect.) And this can happen for multiple entry points. So I think parallel at the module level is more important than at the SPIR-V function level. If indeed the entry function references multiple other functions and those get converted into SPIR-V functions, because those functions are explicit capturing (in both std and GPU dialect), the generated SPIR-V function should also be explicit capturing. So the implicit capturing can be scoped inside the entry function only. Now if we import some random SPIR-V binary from outside, it might not have this benefit though. But still I'm not sure at the moment how many function passes we will have in SPIR-V dialect specifically and whether we can have a generic function pass that can do selective parallelism.
Back to the approach of using symbol ref instead of SSA values, it requires massive change to the dialect representation. Turning spv.Load
to accept a symbol instead of a SSA value has infectious effect; now all pointer generating ops (both variables and function parameters) need to have symbols and all ops taking in pointers need to do it via symbol ref. It's a small portion of the ops compared to the whole op vocabulary in SPIR-V, but they are all fairly commonly used ops (OpVariable
, OpFunctionParameter
, OpLoad
, OpStore
, OpAccessChain
, etc.). The majority of a SPIR-V module are these ops. So instead of going through SSA values and def-use chains, now we turn the module into mostly symbol ref based and via hash tables. I'm afraid we are paying a high cost here for the potential gain over multi-threaded compilation.
But I might miss other down sides of implicit capturing. Please feel free to raise if you know others! :)
So instead of going through SSA values and def-use chains, now we turn the module into mostly symbol ref based and via hash tables. I'm afraid we are paying a high cost here for the potential gain over multi-threaded compilation.
It is true you don't have use-list anymore, but how often do you need to get to the source of these ops in transformations? (for this to be a high-cost, it would need to be very common to need to look this up).
I have experience on graphics side. For graphics it's common to have lots of resources (represented sa global variables in SPIR-V) and doing all kinds of load/stores on them with access chains. Load/store elimination and mem2reg passes are important passes to perform and they rely on knowing which resources (global variables) they are performing on. ML use cases might be different, but I'd assume still in the ballpark.
OTOH, I think we can actually support both FuncOp
and spv.function
. When lowering from a high-level dialect, use FuncOp
if possible. With this approach, we only need spv.function
for the entry function. When importing some SPIR-V binary, we can use spv.function
initially and then have a module pass to turn those non-implicit-capturing ones to use FuncOp
.
Knowing if two loads comes from the same global should be also a single comparison of the attribute, there shouldn’t be much pessimization here.
I much prefer that we start with an isolated spirv.func until we get better evidence that it cannot work.
Thanks for the questions, Mehdi! They are quite useful for shaping the design. :)
Knowing if two loads comes from the same global should be also a single comparison of the attribute, there shouldn’t be much pessimization here.
I'm afraid not necessarily. To further elaborate, I think a concrete example may help.
Say we have the following HLSL source code:
struct S {
float4 data[16];
};
StructuredBuffer<S> Input;
StructuredBuffer<float> Output;
void main(uint3 id : SV_DispatchThreadID) {
uint x = id.x;
S st = Input[x];
float4 vec = Input[x].data[5];
float val = Input[x].data[5][1];
Output[x] = st.data[5][1] + vec[1] + val;
}
(Mostly C++ syntax; StructuredBuffer
is a data structure for declaring a GPU resource in HLSL.)
If translated naively, main
corresponds to
%x_val = spv.Load %x
%ptr0 = spv.AccessChain %Input %cst_0, %x_val
%st_val0 = spv.Load %ptr0
spv.Store %st, %st_val0
%ptr1 = spv.AccessChain %Input %cst_0, %x_val, %cst_0, %cst_5
%vec_val0 = spv.Load %ptr1
spv.Store %vec, %vec_val0
%ptr2 = spv.AccessChain %Input %cst_0, %x_val, %cst_0, %cst_5, %cst_1
%val_val0 = spv.Load %ptr2
spv.Store %val, %val_val0
%ptr3 = spv.AccessChain %st %cst_0, %cst_5, %cst_1
%flt0 = spv.Load %ptr3
%ptr4 = spv.AccessChain %vec %cst_1
%flt1 = spv.Load %ptr4
%add0 = spv.FAdd %flt0, %flt1
%flt2 = spv.Load %val
%add1 = spv.FAdd %add0, %flt2
%ptr5 = spv.AccessChain %Output %cst_0, %x_val
spv.Store %ptr5, %add1
(The naive translation is actually more complicated because of copying data across storage class but that's not important here. You can check the complete version here by disabling optimizations.)
The above is actually just reading the same float three times. In order to understand that, we have multiple hoops to trace back. E.g., for %flt0
: %ptr3
-> access chain -> %st
-> store -> load -> %ptr0
-> access chain -> %Input
. If we model the above using symbol ref, then we have
%x_val = spv.Load @x
spv.AccessChain @Input %cst_0, %x_val { name: "ptr0" }
%st_val0 = spv.Load @ ptr0
spv.Store @st, %st_val0
spv.AccessChain @Input %cst_0, %x_val, %cst_0, %cst_5 {name: "ptr1"}
%vec_val0 = spv.Load @ ptr1
spv.Store @ vec, %vec_val0
spv.AccessChain @ Input %cst_0, %x_val, %cst_0, %cst_5, %cst_1 {name: "ptr2"}
%val_val0 = spv.Load @ptr2
spv.Store @val, %val_val0
spv.AccessChain @st %cst_0, %cst_5, %cst_1 {name: "ptr3"}
%flt0 = spv.Load @ptr3
spv.AccessChain @vec %cst_1 {name: "ptr4"}
%flt1 = spv.Load @ptr4
%add0 = spv.FAdd %flt0, %flt1
%flt2 = spv.Load @val
%add1 = spv.FAdd %add0, %flt2
spv.AccessChain %Output %cst_0, %x_val {name: "ptr5"}
spv.Store @ptr5, %add1
Each of these hoops will be going through a symbol table query, instead of just a C++ pointer deference with modelling as SSA values. So it's not just a single comparison for the load/store/access-chain on %Input
alone but on every step along the chain.
(Note that the above is not an entirely conceived example; it's actually a simple version of a common pattern in HLSL programming for graphics. In HLSL, it's quite common to group resources into multi-level deeply-nested structs, refer to middle-level sub-components with a local copy, pass them into other functions and then further get sub-components out of them. One of the reason of doing this is that each such StructuredBuffer
takes a descriptor; grouping related data into one descriptor makes sense. Optimizations are heavily relied upon to eliminate these local copies and directly use the original data. In ML world things are nicer. But I'm still expecting to see similar patterns. Besides, one of the goals of SPIR-V dialect is to leave the door for other use cases.)
OTOH, if we go with symbol ref based approach, we lose the goodies innate to using SSA values. It's much easier to generate invalid code now. We need to manually verify a basic block for things like each load/store op is referencing a symbol that is defined before it, which just comes naturally when using normal SSA values. This is different than just a few functions at the module level that def use order does not matter. There can be lots of resources and various pointers derived from them. For each ops taking in a symbol ref, we need to make sure the object pointed by the symbol is of a consistent type. Plus we cannot easily get all the uses of a global variable so, for example, eliminating unused global variables become non-trivial. etc.
I personally analogize the symbol ref based approach to TensorFlow's GraphDef format, which uses symbols to link all the ops. With enough ops involving pointers, we are in a situation as loose as that. I'd say that's not very friendly to compiler transformations.
So with these considerations, I still think modelling it as implicit capturing is a natural and preferable way for now. We can convert spv.function
to FuncOp
by doing straightforward analysis and gain the benefits on multi-threaded function compilation if that turns out to be really important. Nothing is cast to the stone forever; we can develop and see. :)
WDYT?
I much prefer that we start with an isolated spirv.func until we get better evidence that it cannot work.
That's actually the approach we've been taken. We have been using FuncOp
from the beginning until now where we see a problem. :)
There is a related thing that I haven't quite wrapped my head around is the difference between LLVM handling of globals and what I know in MLIR (so it might be down to my ignorance).
Take this simple example
int value = 0;
int main() {
value = 1;
}
In LLVM this becomes
@value = dso_local local_unnamed_addr global i32 0, align 4
define dso_local i32 @main() local_unnamed_addr #0 !dbg !7 {
store i32 1, i32* @value, align 4, !dbg !10, !tbaa !11
ret i32 0, !dbg !15
}
The store
instruction itself just expects a pointer. It happens to be a symbol reference. The store only cares about the type of the argument.
In MLIR case, as example, take the definition of store in SPIR-V for example. The argument list is as follows:
let arguments = (ins
SPV_AnyPtr:$ptr,
SPV_Type:$value,
OptionalAttr<SPV_MemoryAccessAttr>:$memory_access,
OptionalAttr<I32Attr>:$alignment
);
First argument is a pointer to store to. If I need to use symbol reference I need to change the definition of the argument list (AFAIK) to
let arguments = (ins
SymRefAttr:$ptr,
SPV_Type:$value,
OptionalAttr<SPV_MemoryAccessAttr>:$memory_access,
OptionalAttr<I32Attr>:$alignment
);
Given this, if we want to store using pointers and store using symbol reference (as was suggested by @joker-eph ), I need two store instruction definitions (or some other approach that allows for using both). That seems to be different from LLVM. Is this intended, or am I mistaken? Granted, each dialect could invent a way to handle this, but wondering if there is a standard way of addressing this. This is one of the reasons changing from the SSA-Value based approach to symbol reference based approach for globals would require quite a lot of change to SPIR-V dialect
I've been looking into globals in the LLVM dialect. One possible approach would look like
module {
llvm.global @foo(42.0: f32) : !llvm.float
func @bar() {
%0 = llvm.addressof @foo : !llvm<"float*">
%1 = llvm.load %0 : !llvm<"float*">
}
}
that would go through a special MLIR-specific op that would convert the stringref attribute to an SSA value usable locally (assuming LLVM's globals are actually pointers). This only reads the symbol table so it shouldn't interfere with parallel execution of a function pass. It also avoids the need to support parsing both %-identifiers and @-identifiers for multiple ops, at a cost of extra indirection through a new operation. Maybe something similar could work for the SPIR-V case?
Thanks Alex. Thats a good idea!
Trying to summarize the discussion here 1) Having a spv.FunctionOp that can implicitly capture makes the SPIR-V dialect more in tune with the SPIR-V spec. 2) The downside of this is that we cannot multithread function pass that operate on spv.FunctionOp. Also, current FunctionPass infrastructure needs FuncOp, so wither the pass infrastructure needs to change to work with any FunctionLike op, or a separate spv.FunctionPass is need for spv.FunctionOps 3) The alternative is to refer to globals using symbol references, and add an "address_of" operation that gets a pointer to the global variable. This way other instructions that expect a pointer value dont need to change.
My vote would be to go with implicit capture for now. Knowing that we can course correct is useful and if it becomes an issue, the change would be fairly contained. Further, I think ability to process several spv.module in parallel is probably more important for SPIR-V than functions within a module. If this is fine, I can go ahead with making this change. This is blocking some further work of being able to target SPIR-V dialect from other dialects.
Thanks Alex! That's indeed a good suggestion. A similar spv.addressof
would be a nice trade-off: we are retaining the SSA value properties for the majority of the ops and only limiting the verification to spv.addressof
op itself. I think it helps to fix the last bit on my previous proposal on converting between std.func
and spv.function
: now entry function does not need to be special anymore.
So I think now in the SPIR-V dialect we can have both std.func
and spv.function
, with spv.function
as the only function op supported by (de)serialization. Going from high-level dialect to SPIR-V lowers to std.func
first and run whatever optimization in parallel, and as the last step run a pass to hoist constants and global variables to spv.module
and convert std.func
into spv.function
. Deserializing a SPIR-V blob uses spv.function
for all functions initially and as the first step to run a pass to convert them into std.func
by localizing constants and inserting spv.addressof
of global variables.
Theoretically we can do all the logic (between std.func
and spv.function
conversion) inside the (de)serializer, but I don't think that's the proper place for the task and it's not proper separation of concerns. We should use the common MLIR infrastructure for IR transformations. I think the conversions between std.func
and spv.function
naturally falls into that category so using MLIR infra gives us better usability, performance, testing, etc.
@joker-eph: what do you think? :)
Theoretically we can do all the logic (between std.func and spv.function conversion) inside the (de)serializer, but I don't think that's the proper place for the task and it's not proper separation of concerns. We should use the common MLIR infrastructure for IR transformations.
This is the cornerstone of your argument, and I think you need to expand it a bit: to justify not doing it in the converter I'd like get a bit more information about how heavy it is to do it here, it isn't clear to me that this can't be done reasonably there (another prior art is the flat-buffer importer for TFLite: we don't try to just import naively the flat buffer format). Of course translations in and out of MLIR to external format should be "simple", but "triviality" is not above all: we design an IR first. My concern is maintaining a parallel IR and infrastructure in SPIR-V IR in MLIR, and the confusion and the role of the two. This is not free and the effects can be longer lasting and deeper in the compiler compared to some one-shot complexity in the deserializer/serializer.
By the way, I hope you didn't miss: https://groups.google.com/a/tensorflow.org/d/topic/mlir/gwuZvfzh1u0/discussion
Writing function passes that operate on spirv.func will be possible (if they are isolated).
By the way, I hope you didn't miss: https://groups.google.com/a/tensorflow.org/d/topic/mlir/gwuZvfzh1u0/discussion Writing function passes that operate on spirv.func will be possible (if they are isolated).
Yep, saw that awesome proposal previously. Thanks for the heads up! :)
After some back and forth, we have converged on not going down the path of implicit capture. This requires some changes to the existing SPIR-V dialect
1) Global variables (variables in scope of spv.module) have to not return a SSA value, but rather have a symbol name. The symbol name can be used within functions to refer to these global variables 2) An new instruction (spv._address_of) is needed to get an SSA value from the symbol. This SSA value can be used in other SPIR-V operations (like spv.Load and spv.Store). 3) The spv.EntryPoint instruction would need to have interface variables specified using symbols rather than SSA values 4) Function-level variables can be defined using the existing spv.Variable operation. 5) Since global variables can be initialized using constants or other global variables, the initialization will also be done through use of symbol reference. As a result, constants at module scope will also need to be changed to have a symbol name. An associated mechanism is needed to materialize the constants within function scope to get an SSA value for the constant for use with other instructions.
The CL for steps 1-4 is in flight and should land soon. Will file a separate bug for item 5.
Closing (fixed by commit https://github.com/tensorflow/mlir/commit/ec61a757a00abbf762a53a32a0b8955d215b679a)
Currently SPIR-V dialect is using the FuncOp from StandardOps dialect. FuncOp do not allow implicit capture, whereas the SPIR-V specification requires that functions in SPIR-V dialect have access to global variables defined outside off the function. SPIR-V dialect needs to have its own FuncOp to allow such functionality.