Modifications and memory occupation in vineyard torch module

Describe your problem

Thanks for @TrafalgarZZZ reporting. There are two issues when testing vineyard torch module.

The original torch module to put in vineyard will be modified after the put function.

Reproduce

import safetensors
import safetensors.torch

import vineyard
import vineyard.contrib.ml.torch as vineyard_torch

with open("/mnt/stable-diffusion-models/Stable-diffusion/v1-5-pruned-emaonly-1.safetensors", 'rb') as f:
   state_dict = safetensors.torch.load(f.read())

print(state_dict)

client = vineyard.connect("/tmp/vineyard_test.sock")
with vineyard_torch.torch_context(client):
   client.put(state_dict)

print(state_dict)

Original state_dict

...
'cond_stage_model.transformer.text_model.encoder.layers.0.mlp.fc1.weight': tensor([[ 0.0402,  0.0049,  0.0031,  ...,  0.0076, -0.0040, -0.0004],
         [ 0.0320, -0.0247,  0.0270,  ...,  0.0014, -0.0266, -0.0196],
         [-0.0072,  0.0229,  0.0050,  ..., -0.0068, -0.0446, -0.0313],
         ...,
         [ 0.0280, -0.0149,  0.0136,  ...,  0.0182, -0.0120, -0.0161],
         [ 0.0343, -0.0128, -0.0234,  ...,  0.0229, -0.0218,  0.0272],
         [ 0.0184,  0.0124,  0.0135,  ..., -0.0094,  0.0302, -0.0117]]),
 ...}

state_dict after put

...
 'cond_stage_model.transformer.text_model.encoder.layers.10.layer_norm2.weight': None, 'model.diffusion_model.middle_block.1.proj_out.bias': None, 'model.diffusion_model.output_blocks.9.0.in_layers.2.weight': None, 'first_stage_model.encoder.mid.block_1.conv2.weight': None, 'model.diffusion_model.output_blocks.4.1.transformer_blocks.0.norm3.bias': None, 'model.diffusion_model.output_blocks.6.1.transformer_blocks.0.ff.net.2.weight': None, 'model.diffusion_model.input_blocks.7.0.out_layers.3.weight': None, 'first_stage_model.decoder.up.2.block.1.norm2.weight': None, 'first_stage_model.encoder.down.1.block.0.conv1.weight': None, 'cond_stage_model.transformer.text_model.encoder.layers.5.mlp.fc2.bias': None}

The torch module will be put in vineyard partly when the vineyard memory is not enough, which causes unnecessary memory occupation.

Start vineyardd with 1Gi memory which can't hold all tensors (around 4.5Gi), then run the following code.

Reproduce

import safetensors
import safetensors.torch

import vineyard
import vineyard.contrib.ml.torch as vineyard_torch

with open("/mnt/stable-diffusion-models/Stable-diffusion/v1-5-pruned-emaonly-1.safetensors", 'rb') as f:
   state_dict = safetensors.torch.load(f.read())

client = vineyard.connect("/tmp/vineyard_test.sock")
try:
  with vineyard_torch.torch_context(client):
     client.put(state_dict)
except:
  print(client.status)

InstanceStatus:
    instance_id: 11
    deployment: local
    memory_usage: 1072015920
    memory_limit: 1073741824
    deferred_requests: 0
    ipc_connections: 1
    rpc_connections: 0

Actually, we can't put the incomplete tensor into vineyard.

v6d-io / v6d

Modifications and memory occupation in vineyard torch module #1859

Describe your problem

The original torch module to put in vineyard will be modified after the put function.

Reproduce

Original state_dict

state_dict after put

The torch module will be put in vineyard partly when the vineyard memory is not enough, which causes unnecessary memory occupation.

Reproduce