Closed dashanji closed 2 months ago
Thanks for @TrafalgarZZZ reporting. There are two issues when testing vineyard torch module.
import safetensors import safetensors.torch import vineyard import vineyard.contrib.ml.torch as vineyard_torch with open("/mnt/stable-diffusion-models/Stable-diffusion/v1-5-pruned-emaonly-1.safetensors", 'rb') as f: state_dict = safetensors.torch.load(f.read()) print(state_dict) client = vineyard.connect("/tmp/vineyard_test.sock") with vineyard_torch.torch_context(client): client.put(state_dict) print(state_dict)
... 'cond_stage_model.transformer.text_model.encoder.layers.0.mlp.fc1.weight': tensor([[ 0.0402, 0.0049, 0.0031, ..., 0.0076, -0.0040, -0.0004], [ 0.0320, -0.0247, 0.0270, ..., 0.0014, -0.0266, -0.0196], [-0.0072, 0.0229, 0.0050, ..., -0.0068, -0.0446, -0.0313], ..., [ 0.0280, -0.0149, 0.0136, ..., 0.0182, -0.0120, -0.0161], [ 0.0343, -0.0128, -0.0234, ..., 0.0229, -0.0218, 0.0272], [ 0.0184, 0.0124, 0.0135, ..., -0.0094, 0.0302, -0.0117]]), ...}
... 'cond_stage_model.transformer.text_model.encoder.layers.10.layer_norm2.weight': None, 'model.diffusion_model.middle_block.1.proj_out.bias': None, 'model.diffusion_model.output_blocks.9.0.in_layers.2.weight': None, 'first_stage_model.encoder.mid.block_1.conv2.weight': None, 'model.diffusion_model.output_blocks.4.1.transformer_blocks.0.norm3.bias': None, 'model.diffusion_model.output_blocks.6.1.transformer_blocks.0.ff.net.2.weight': None, 'model.diffusion_model.input_blocks.7.0.out_layers.3.weight': None, 'first_stage_model.decoder.up.2.block.1.norm2.weight': None, 'first_stage_model.encoder.down.1.block.0.conv1.weight': None, 'cond_stage_model.transformer.text_model.encoder.layers.5.mlp.fc2.bias': None}
Start vineyardd with 1Gi memory which can't hold all tensors (around 4.5Gi), then run the following code.
import safetensors import safetensors.torch import vineyard import vineyard.contrib.ml.torch as vineyard_torch with open("/mnt/stable-diffusion-models/Stable-diffusion/v1-5-pruned-emaonly-1.safetensors", 'rb') as f: state_dict = safetensors.torch.load(f.read()) client = vineyard.connect("/tmp/vineyard_test.sock") try: with vineyard_torch.torch_context(client): client.put(state_dict) except: print(client.status)
InstanceStatus: instance_id: 11 deployment: local memory_usage: 1072015920 memory_limit: 1073741824 deferred_requests: 0 ipc_connections: 1 rpc_connections: 0
Actually, we can't put the incomplete tensor into vineyard.
Describe your problem
Thanks for @TrafalgarZZZ reporting. There are two issues when testing vineyard torch module.
The original torch module to put in vineyard will be modified after the put function.
Reproduce
Original state_dict
state_dict after put
The torch module will be put in vineyard partly when the vineyard memory is not enough, which causes unnecessary memory occupation.
Start vineyardd with 1Gi memory which can't hold all tensors (around 4.5Gi), then run the following code.
Reproduce
Actually, we can't put the incomplete tensor into vineyard.