Open vivekkhandelwal1 opened 2 years ago
@jpienaar We would like your help triage this. This is for big model support.
The attribute inside the parenthesis is the initial value, I don't think you are associating that with anything. Post the pass to generate accessors those should be used to set the init value and create unitialized globals.
The attribute inside the parenthesis is the initial value, I don't think you are associating that with anything. Post the pass to generate accessors those should be used to set the init value and create unitialized globals.
Hi @silvasean, what are your views on this?
What is the exact question?
@jpienaar I am assigning to you for now, please feel free to reassign.
@jpienaar if you can provide a small sample of how to use ml_program with IREE for separate weights that help unblock us.
Hi @silvasean @jpienaar, I have a few questions to ask regarding this issue with the https://github.com/llvm/torch-mlir/pull/1402.
1.) Are the weights saved in a file during the execution of this pass? If yes, then where can that file be located?
2.) Currently the bufferization is not supported for ml_program
ops, because while running it on the refbackend it fails during the bufferization. Do we have to add that pass?
3.) We need to load the weights from the file at some instance, where should this be done? Like, as after conversion to linalg IR, or even a bit lower in the pipeline.
4.) In IREE, during the runtime i.e., with the iree-run-module
we can pass the input files, can we do the same for the weights?
@foo.bar
would by my_nn_model.foo.bar
. You can walk this with model.named_parameters()
or similar introspection.btw, @jpienaar one idea: instead of accessors, to have the module initializer somehow once-initialize the globals (not sure how to plumb that through). That should let IREE optimize the globals more.
Agreed was chatting with Ben on that too. The global initializers will result in util.initializer which should do that. I unfortunately have two deadlines today which makes full response here difficult, but hopefully day allows/can type enough below to sketch it out on mobile.
Basically, let's start with not marking these as having initializers and having explicit set calls (using what Sean suggested to get to variables in memory PyTorch side). This gives you full control, no additional plumbing or conventions needed.
Second easiest is to use MLIR bytecode and DenseElementsAttr initializer with weight for the globals (bytecode is a lot faster with printing and parsing here). Cost is primarily around uniquing. This should require very little work, good to measure overheads but can be improved upon (exact amount TBD but I think it'll be measurable improvement). Again no additional plumbing needed.
Third would be to convert those initializers to use the blob resource (so stored outside context, can be serialized to .mlirbc as mmap-able weight). IREE side we may need to inject serialization interface for these, but that should be easy. That way you save the cost of ingesting into context. Beyond one external interface no work should be needed.
Fourth would be referencing purely by some URI. Thatll require more work, we'll need weight file format and establish a couple of conventions. Well ....there is the option of adding resource type which support URI without creating additional conventions. With correct interfaces that should work (and I believe all those interfaces already exist). This seperates the constants (in foldable format as Ben wants) from the loading/regular canonicalization. From 3 to 4 biggest difference is in pre-deployment/python-in-the-loop scenarios. (I'd even consider always flattening from 4 to 3 for deployment, resources re mutable, so only cost is when serializing again and that's at file writing speed while partial updates is easy to add support form [and I've started playing around with that]).
+1 to what Jacques said!
When actually deploying/measuring performance I strongly encourage keeping the constants in the model so that we can optimize them. In cases where that's not possible (constants not yet available during compilation, wanting to switch between multiple parameter sets, constants are 10GB+, etc) the setters will in the fullness of time be pretty efficient as we can hoist work across global store/load into the setters - but nothing will be as efficient as if we can access/mutate the constants during compilation. Somewhat paradoxically the larger the constants the more beneficial it is to keep them in the program as at runtime we can directly stream from disk to GPU or leave the memory mapped and pageable - otherwise it's almost guaranteed you need 2x the total memory in order to load things or retain host-accessible wired copies unless the hosting application is really careful. Not to much of a concern if you have 20MB but really bad if you have 20GB, and if you have 20MB they should definitely be kept in the program during compilation :)
Today in order to run stateful sequences on the command line iree-run-trace is required - that lets you call setters along with the other methods just like an application would. We could have some special handling in iree-run-module & co for calling setters but I'd like to avoid that if possible (there's lots of sticky details that impact performance/reproducability like are the values read-only, are they read-only only to us, are they mappable from disk, etc).
As Jacques mentions there's an optional path we could enable that allows programs to load files at runtime via a custom VM module but it'd only be opt-in: for security and portability purposes we want to avoid putting any file IO in the default deployment paths. It's best if we can leave such a path for ultra-large multi-node sharded things while nearly all other deployments use in-program constants. Medium-term I'd really like to have something modeling MTLIOCommandQueue/DirectStorage/cuFile APIs in the HAL such that we can seamlessly interleave IO and execution. Since that's a bit of a ways off a simpler version could be added to start but with a caveat that it'll eventually get replaced. Can brainstorm more about this if interested!
Yes a brainstorm would be great. If we need the constants for global optimization maybe there is less incentive to go down this path and instead try to speed up compilation while having the entire model on disk to mmap ?
Measurement are very welcome here to direct need. (I always like flame graphs showing per function time taken)
@jpienaar This one has gone stale, moving to backlog and please bring back up if needed.
What happened?
I'm getting the following error while compiling the linalg IR through
iree-compile
:Steps to reproduce your issue
To reproduce the issue, run the following command:
The IR file is available at: https://gist.github.com/vivekkhandelwal1/a84143ac41095a8d5e3ba1eefdd1564c
What component(s) does this issue relate to?
MLIR, Compiler
Version information
Additional context
No response