Open GxjGit opened 1 year ago
Thanks for your issue, we will fix the bug as soon as we can
If cond_stage_trainable = True
, it will also report an error
│ /opt/conda/lib/python3.7/site-packages/colossalai/gemini/chunk/manager.py:159 in get_chunk
│
│ 156 Args:
│ 157 tensor (torch.Tensor): a torch tensor object
│ 158 """
│ ❱ 159 return self.tensor_chunk_map[tensor]
│ 160
│ 161 def get_cuda_movable_chunks(self) -> List[Chunk]:
│ 162 """
╰───────────────────────────────────────────────────────
KeyError: ColoParameter: ColoTensor:
Parameter containing:
Parameter(ColoParameter([[ 4.2009e-04, -3.7899e-03, 3.8624e-03, ..., -8.2350e-04,
1.2369e-03, 5.8413e-04],
[ 3.8624e-04, -1.3628e-03, 2.3880e-03, ..., -7.9250e-04,
2.1076e-03, 1.0943e-04],
[ 1.2493e-03, 9.7466e-04, 1.9093e-03, ..., 1.4000e-03,
1.1845e-03, -9.9087e-04],
...,
[-1.3588e-02, -1.8244e-03, 8.0872e-03, ..., 5.8174e-03,
-1.0162e-02, -3.7980e-04],
[-1.0368e-02, 6.7711e-03, 1.0557e-03, ..., 1.1563e-05,
-9.3384e-03, -1.8854e-03],
[-1.7729e-03, -1.2070e-02, -1.2665e-02, ..., 9.3079e-03,
6.6338e-03, -6.0425e-03]], device='cuda:1',
dtype=torch.float16))
DistSpec:
placement: DistPlacementPattern.REPLICATE
ProcessGroup:
Rank: 0, World size: 1, DP degree: 1, TP degree: 1
Ranks in group: [0]
None
@Fazziekey Hi, have you fixed this problem?
@Fazziekey Hi, have you fixed this problem?
Thanks for your issue, Now, we don't support con-stage training, we will support it in the future.
@Fazziekey Hi, have you fixed this problem?
Thanks for your issue, Now, we don't support con-stage training, we will support it in the future.
is it supported now?
not yet
🐛 Describe the bug
I can successfully ran the exampls with default setting. according to https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion
When I change the value of use_ema from False to True, error occurred:
what would be the reason for this problem? Thanks.
log info :
Environment