Issue while compiling the linalg IR

vivekkhandelwal1 commented 2 years ago

What happened?

I'm getting the following error while compiling the linalg IR through iree-compile:

linalg_ir.mlir:2:3: error: 'stream.tensor.constant' op attribute 'value' failed to satisfy constraint: constant vector/tensor attribute
  ml_program.global public @_param_constant0(#ml_program.extern : tensor<250880x4096xf32>) : tensor<250880x4096xf32>
  ^
linalg_ir.mlir:3:3: error: 'stream.tensor.constant' op attribute 'value' failed to satisfy constraint: constant vector/tensor attribute
  ml_program.global public @_param_constant1(#ml_program.extern : tensor<4096xf32>) : tensor<4096xf32>
  ^
linalg_ir.mlir:4:3: error: 'stream.tensor.constant' op attribute 'value' failed to satisfy constraint: constant vector/tensor attribute
  ml_program.global public @_param_constant2(#ml_program.extern : tensor<4096xf32>) : tensor<4096xf32>
  ^
linalg_ir.mlir:5:3: error: 'stream.tensor.constant' op attribute 'value' failed to satisfy constraint: constant vector/tensor attribute
  ml_program.global public @_param_constant3(#ml_program.extern : tensor<4096xf32>) : tensor<4096xf32>
  ^
linalg_ir.mlir:6:3: error: 'stream.tensor.constant' op attribute 'value' failed to satisfy constraint: constant vector/tensor attribute
  ml_program.global public @_param_constant4(#ml_program.extern : tensor<4096xf32>) : tensor<4096xf32>
  ^
compilation failed

Steps to reproduce your issue

To reproduce the issue, run the following command:

iree-compile --iree-input-type=tm_tensor --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=llvm-cpu --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false -iree-llvm-target-triple=x86_64-linux-gnu --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64 ir_file.mlir

The IR file is available at: https://gist.github.com/vivekkhandelwal1/a84143ac41095a8d5e3ba1eefdd1564c

What component(s) does this issue relate to?

MLIR, Compiler

Version information

iree-compiler          20220921.159

Additional context

No response

powderluv commented 2 years ago

@jpienaar We would like your help triage this. This is for big model support.

jpienaar commented 2 years ago

The attribute inside the parenthesis is the initial value, I don't think you are associating that with anything. Post the pass to generate accessors those should be used to set the init value and create unitialized globals.

vivekkhandelwal1 commented 2 years ago

The attribute inside the parenthesis is the initial value, I don't think you are associating that with anything. Post the pass to generate accessors those should be used to set the init value and create unitialized globals.

Hi @silvasean, what are your views on this?

silvasean commented 2 years ago

What is the exact question?

MaheshRavishankar commented 2 years ago

@jpienaar I am assigning to you for now, please feel free to reassign.

powderluv commented 2 years ago

@jpienaar if you can provide a small sample of how to use ml_program with IREE for separate weights that help unblock us.

vivekkhandelwal1 commented 2 years ago

Hi @silvasean @jpienaar, I have a few questions to ask regarding this issue with the https://github.com/llvm/torch-mlir/pull/1402.

1.) Are the weights saved in a file during the execution of this pass? If yes, then where can that file be located? 2.) Currently the bufferization is not supported for ml_program ops, because while running it on the refbackend it fails during the bufferization. Do we have to add that pass? 3.) We need to load the weights from the file at some instance, where should this be done? Like, as after conversion to linalg IR, or even a bit lower in the pipeline. 4.) In IREE, during the runtime i.e., with the iree-run-module we can pass the input files, can we do the same for the weights?

silvasean commented 2 years ago

No. You will need to walk the torch.nn.Module hierarchy to match them up. E.g. @foo.bar would by my_nn_model.foo.bar. You can walk this with model.named_parameters() or similar introspection.
Integrating this into RefBackend should wait for later, if at all. This doesn't really fit with our usual e2e test flow so I'm not sure how we will test it e2e.
Once you generate accessors on the IREE side, you will set the model weights at runtime using those accessors.
Not sure exactly how to do it with iree-run-module, but you just need to call into the accessors.

btw, @jpienaar one idea: instead of accessors, to have the module initializer somehow once-initialize the globals (not sure how to plumb that through). That should let IREE optimize the globals more.

jpienaar commented 2 years ago

Agreed was chatting with Ben on that too. The global initializers will result in util.initializer which should do that. I unfortunately have two deadlines today which makes full response here difficult, but hopefully day allows/can type enough below to sketch it out on mobile.

Basically, let's start with not marking these as having initializers and having explicit set calls (using what Sean suggested to get to variables in memory PyTorch side). This gives you full control, no additional plumbing or conventions needed.

Second easiest is to use MLIR bytecode and DenseElementsAttr initializer with weight for the globals (bytecode is a lot faster with printing and parsing here). Cost is primarily around uniquing. This should require very little work, good to measure overheads but can be improved upon (exact amount TBD but I think it'll be measurable improvement). Again no additional plumbing needed.

Third would be to convert those initializers to use the blob resource (so stored outside context, can be serialized to .mlirbc as mmap-able weight). IREE side we may need to inject serialization interface for these, but that should be easy. That way you save the cost of ingesting into context. Beyond one external interface no work should be needed.

Fourth would be referencing purely by some URI. Thatll require more work, we'll need weight file format and establish a couple of conventions. Well ....there is the option of adding resource type which support URI without creating additional conventions. With correct interfaces that should work (and I believe all those interfaces already exist). This seperates the constants (in foldable format as Ben wants) from the loading/regular canonicalization. From 3 to 4 biggest difference is in pre-deployment/python-in-the-loop scenarios. (I'd even consider always flattening from 4 to 3 for deployment, resources re mutable, so only cost is when serializing again and that's at file writing speed while partial updates is easy to add support form [and I've started playing around with that]).

benvanik commented 2 years ago

+1 to what Jacques said!

When actually deploying/measuring performance I strongly encourage keeping the constants in the model so that we can optimize them. In cases where that's not possible (constants not yet available during compilation, wanting to switch between multiple parameter sets, constants are 10GB+, etc) the setters will in the fullness of time be pretty efficient as we can hoist work across global store/load into the setters - but nothing will be as efficient as if we can access/mutate the constants during compilation. Somewhat paradoxically the larger the constants the more beneficial it is to keep them in the program as at runtime we can directly stream from disk to GPU or leave the memory mapped and pageable - otherwise it's almost guaranteed you need 2x the total memory in order to load things or retain host-accessible wired copies unless the hosting application is really careful. Not to much of a concern if you have 20MB but really bad if you have 20GB, and if you have 20MB they should definitely be kept in the program during compilation :)

Today in order to run stateful sequences on the command line iree-run-trace is required - that lets you call setters along with the other methods just like an application would. We could have some special handling in iree-run-module & co for calling setters but I'd like to avoid that if possible (there's lots of sticky details that impact performance/reproducability like are the values read-only, are they read-only only to us, are they mappable from disk, etc).

As Jacques mentions there's an optional path we could enable that allows programs to load files at runtime via a custom VM module but it'd only be opt-in: for security and portability purposes we want to avoid putting any file IO in the default deployment paths. It's best if we can leave such a path for ultra-large multi-node sharded things while nearly all other deployments use in-program constants. Medium-term I'd really like to have something modeling MTLIOCommandQueue/DirectStorage/cuFile APIs in the HAL such that we can seamlessly interleave IO and execution. Since that's a bit of a ways off a simpler version could be added to start but with a caveat that it'll eventually get replaced. Can brainstorm more about this if interested!

powderluv commented 2 years ago

Yes a brainstorm would be great. If we need the constants for global optimization maybe there is less incentive to go down this path and instead try to speed up compilation while having the entire model on disk to mmap ?

jpienaar commented 2 years ago

Measurement are very welcome here to direct need. (I always like flame graphs showing per function time taken)

allieculp commented 1 year ago

@jpienaar This one has gone stale, moving to backlog and please bring back up if needed.

iree-org / iree