Closed zhangwaer closed 3 months ago
when i use {nc -zv 127.0.0.1 39445} command, the port is free, the ubuntu report "Connection to 127.0.0.1 39445 port [tcp/*] succeeded!"
This means the port is open and occupied by something. SPU needs a free port here
when i use {nc -zv 127.0.0.1 39445} command, the port is free, the ubuntu report "Connection to 127.0.0.1 39445 port [tcp/*] succeeded!"
This means the port is open and occupied by something. SPU needs a free port here
Thank you very much for your kind reminder. But when change the port to 39439 , which after "ss -tuln | grep 39439" command output nothing, and the code output the same error:debug_error_string = "UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:39439, i wanna to know whether i need to start something about spu before i run gpt2.py :import torch from transformers import AutoTokenizer, GPT2LMHeadModel, GPT2Config import spu.utils.distributed as ppd import json import urllib from collections import OrderedDict from jax.tree_util import tree_map
with open("3pc.json", 'r') as file: conf = json.load(file) ppd.init(conf["nodes"], conf["devices"], framework=ppd.Framework.EXP_TORCH)
gpt2 = GPT2LMHeadModel.from_pretrained("gpt2")
tokenizer=AutoTokenizer.from_pretrained("gpt2") inputs = tokenizer.encode("Hello, my dog is cute", return_tensors="pt")
params = ppd.device("P1")(lambda input: tree_map(lambda x: x.detach().numpy(), input))(gpt2.state_dict()) inputs= ppd.device("P2")(lambda x: x.detach().numpy())(inputs)
res = ppd.device("SPU")(model)(params, inputs) logits= ppd.get(res).logits print(logits)
Issue Type
Build/Install
Modules Involved
Others
Have you reproduced the bug with SPU HEAD?
Yes
Have you searched existing issues?
Yes
SPU Version
spu0.5.0
OS Platform and Distribution
ubuntu18.04
Python Version
3.10.13
Compiler Version
GCC11.2.1
Current Behavior?
when i use {nc -zv 127.0.0.1 39445} command, the port is free, the ubuntu report "Connection to 127.0.0.1 39445 port [tcp/*] succeeded!" But when i use the following code, it report "failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:39445: connection attempt timed out before receiving SETTINGS frame" gpt2.py: import torch from transformers import AutoTokenizer, GPT2LMHeadModel, GPT2Config import spu.utils.distributed as ppd import json import urllib from collections import OrderedDict from jax.tree_util import tree_map
with open("3pc.json", 'r') as file: conf = json.load(file) ppd.init(conf["nodes"], conf["devices"], framework=ppd.Framework.EXP_TORCH)
gpt2 = GPT2LMHeadModel.from_pretrained("gpt2")
tokenizer=AutoTokenizer.from_pretrained("gpt2") inputs = tokenizer.encode("Hello, my dog is cute", return_tensors="pt")
params = ppd.device("P1")(lambda input: tree_map(lambda x: x.detach().numpy(), input))(gpt2.state_dict()) inputs= ppd.device("P2")(lambda x: x.detach().numpy())(inputs)
res = ppd.device("SPU")(model)(params, inputs) logits= ppd.get(res).logits print(logits)
Standalone code to reproduce the issue
Relevant log output
No response