Closed MachineGunLin closed 2 hours ago
by default, nnscaler assumes that computation is launched on GPU.
I think you should place prefix
, inputs
to GPU before passing to the parallel_module
dev = torch.cuda.current_device()
new_result = parallelized_module(
seqs=prefix.to(dev),
keys=inputs.to(dev),
values=inputs.to(dev),
)
Hi, I am trying to use nnscaler to parallelize an Attention module's forward(based on fairseq2's implementation).
I manage to use the
parallelize
method to parallelize my module and got the gencode.However, when I try to run the module I got an RuntimeError and don't know how to fix it.
Here is my code(parallelize_attn.py):
This is the output on terminal:
Before calling parallelized_module's forward, it was all good.
gencode0.py
looks like this:If you know how to fix this or you need more information, please let me know.
Thank you for your help.