secretflow / spu

SPU (Secure Processing Unit) aims to be a provable, measurable secure computation device, which provides computation ability while keeping your private data protected.
https://www.secretflow.org.cn/docs/spu/en/
Apache License 2.0
230 stars 99 forks source link

[Bug]: Dear teachers, when use spu on torch, i found the 3pc.json connection is wrong #739

Closed zhangwaer closed 3 months ago

zhangwaer commented 3 months ago

Issue Type

Build/Install

Modules Involved

Others

Have you reproduced the bug with SPU HEAD?

Yes

Have you searched existing issues?

Yes

SPU Version

spu0.5.0

OS Platform and Distribution

ubuntu18.04

Python Version

3.10.13

Compiler Version

GCC11.2.1

Current Behavior?

when i use {nc -zv 127.0.0.1 39445} command, the port is free, the ubuntu report "Connection to 127.0.0.1 39445 port [tcp/*] succeeded!" But when i use the following code, it report "failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:39445: connection attempt timed out before receiving SETTINGS frame" gpt2.py: import torch from transformers import AutoTokenizer, GPT2LMHeadModel, GPT2Config import spu.utils.distributed as ppd import json import urllib from collections import OrderedDict from jax.tree_util import tree_map

with open("3pc.json", 'r') as file: conf = json.load(file) ppd.init(conf["nodes"], conf["devices"], framework=ppd.Framework.EXP_TORCH)

gpt2 = GPT2LMHeadModel.from_pretrained("gpt2")

tokenizer=AutoTokenizer.from_pretrained("gpt2") inputs = tokenizer.encode("Hello, my dog is cute", return_tensors="pt")

params = ppd.device("P1")(lambda input: tree_map(lambda x: x.detach().numpy(), input))(gpt2.state_dict()) inputs= ppd.device("P2")(lambda x: x.detach().numpy())(inputs)

res = ppd.device("SPU")(model)(params, inputs) logits= ppd.get(res).logits print(logits)

Standalone code to reproduce the issue

{
    "id": "outsourci.3pc",
    "nodes": {
        "node:0": "127.0.0.1:39445",
        "node:1": "127.0.0.1:65511",
        "node:2": "127.0.0.1:65512",
        "node:3": "127.0.0.1:65513",
        "node:4": "127.0.0.1:65514"
    },
    "devices": {
        "SPU": {
            "kind": "SPU",
            "config": {
                "node_ids": [
                    "node:0",
                    "node:1",
                    "node:2"
                ],
                "spu_internal_addrs": [
                    "127.0.0.1:65515",
                    "127.0.0.1:65516",
                    "127.0.0.1:65517"
                ],
                "experimental_data_folder": [
                    "/tmp/spu_data_0/",
                    "/tmp/spu_data_1/",
                    "/tmp/spu_data_2/"
                ],
                "runtime_config": {
                    "protocol": "ABY3",
                    "field": "FM64",
                    "enable_pphlo_profile": true,
                    "enable_hal_profile": true
                }
            }
        },
        "P1": {
            "kind": "PYU",
            "config": {
                "node_id": "node:3"
            }
        },
        "P2": {
            "kind": "PYU",
            "config": {
                "node_id": "node:4"
            }
        }
    }
}

Relevant log output

No response

anakinxc commented 3 months ago

when i use {nc -zv 127.0.0.1 39445} command, the port is free, the ubuntu report "Connection to 127.0.0.1 39445 port [tcp/*] succeeded!"

This means the port is open and occupied by something. SPU needs a free port here

zhangwaer commented 3 months ago

when i use {nc -zv 127.0.0.1 39445} command, the port is free, the ubuntu report "Connection to 127.0.0.1 39445 port [tcp/*] succeeded!"

This means the port is open and occupied by something. SPU needs a free port here

Thank you very much for your kind reminder. But when change the port to 39439 , which after "ss -tuln | grep 39439" command output nothing, and the code output the same error:debug_error_string = "UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:39439, i wanna to know whether i need to start something about spu before i run gpt2.py :import torch from transformers import AutoTokenizer, GPT2LMHeadModel, GPT2Config import spu.utils.distributed as ppd import json import urllib from collections import OrderedDict from jax.tree_util import tree_map

with open("3pc.json", 'r') as file: conf = json.load(file) ppd.init(conf["nodes"], conf["devices"], framework=ppd.Framework.EXP_TORCH)

gpt2 = GPT2LMHeadModel.from_pretrained("gpt2")

tokenizer=AutoTokenizer.from_pretrained("gpt2") inputs = tokenizer.encode("Hello, my dog is cute", return_tensors="pt")

params = ppd.device("P1")(lambda input: tree_map(lambda x: x.detach().numpy(), input))(gpt2.state_dict()) inputs= ppd.device("P2")(lambda x: x.detach().numpy())(inputs)

res = ppd.device("SPU")(model)(params, inputs) logits= ppd.get(res).logits print(logits)

anakinxc commented 3 months ago

Have you launched backend runtime according to here

zhangwaer commented 3 months ago

Have you launched backend runtime according to here Thank you very much for your reply, i miss the setp of starting the node.