Open KimmiShi opened 1 year ago
You were still having NUM_CHUNKS=1
. How about setting it to 2
?
You were still having
NUM_CHUNKS=1
. How about setting it to2
?
Thanks, I've set it to 2
(but not shown in the code above) .
File "/mnt/.../.local/lib/python3.8/site-packages/colossalai/engine/schedule/_pipeline_schedule.py", line 490, in pre_processing
elif isinstance(engine.model[0], NaiveAMPModel):
TypeError: 'PipelinableModel' object is not subscriptable
So I think this might be a bug? As it try to access engine.model[0]
It happens that PipelinableModel
wraps around a list of modules without exposing them. Can you try temporarily patching this function as below?
def pre_processing(self, engine):
from colossalai.zero.sharded_model.sharded_model_v2 import ShardedModelV2
if isinstance(engine.model, ShardedModelV2):
self.dtype = torch.half
modules = engine.model._module_list;
if isinstance(modules[0], NaiveAMPModel):
self.dtype = torch.half
for model in modules:
if isinstance(model, NaiveAMPModel):
model = model.model
sig = inspect.signature(model.forward)
for p in sig.parameters.values():
assert p.kind != inspect.Parameter.VAR_POSITIONAL, '*args is not supported'
We will seek to solve in better ways soon!
@ver217 , may you take a look at this bug, or was it indeed a bug? Thanks!
It happens that
PipelinableModel
wraps around a list of modules without exposing them. Can you try temporarily patching this function as below?def pre_processing(self, engine): from colossalai.zero.sharded_model.sharded_model_v2 import ShardedModelV2 if isinstance(engine.model, ShardedModelV2): self.dtype = torch.half modules = engine.model._module_list; if isinstance(modules[0], NaiveAMPModel): self.dtype = torch.half for model in modules: if isinstance(model, NaiveAMPModel): model = model.model sig = inspect.signature(model.forward) for p in sig.parameters.values(): assert p.kind != inspect.Parameter.VAR_POSITIONAL, '*args is not supported'
We will seek to solve in better ways soon!
Thanks, I tried and got another error msg:
File "/mnt/xxx/.local/lib/python3.8/site-packages/colossalai/engine/schedule/_pipeline_schedule.py", line 586, in forward_backward_step
input_objs = [[] for _ in range(len(model))]
TypeError: object of type 'PipelinableModel' has no len()
Yes, we should expect this error. Let's wait for some reply!
@YuliangLiu0306 Did pipelinable models support interleaved 1f1b?
🐛 Describe the bug
I am trying to reproduce the resnet50-pipeline parallel demo in this page: https://colossalai.org/docs/features/pipeline_parallel
I can go well with the code in this page. However, I'd like to try the interleaved scheduler. And I tried the following things:
NUM_CHUNKS=2
in the above example. I got an error msg like:RuntimeError: Given groups=1, weight of size [512, 2048, 1, 1], expected input[64, 1024, 14, 14] to have 2048 channels, but got 1024 channels instead
model.num_chunks
in CONFIG to create aInterleavedPipelineSchedule
object. I did so and got another Error msg:code script:
Environment
No response