Closed yinochaos closed 1 year ago
similar to AWS, it's TikTok's cloud server which called “volcengine” which I run code on
GPU is A800 80G;nvidia-smi
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A800-SXM... On | 00000000:69:01.0 Off | 0 |
| N/A 31C P0 107W / 400W | 3MiB / 81251MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A800-SXM... On | 00000000:69:02.0 Off | 0 |
| N/A 31C P0 61W / 400W | 3MiB / 81251MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA A800-SXM... On | 00000000:6B:01.0 Off | 0 |
| N/A 31C P0 65W / 400W | 3MiB / 81251MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA A800-SXM... On | 00000000:6B:02.0 Off | 0 |
| N/A 30C P0 62W / 400W | 3MiB / 81251MiB | 0% Default |
| | | Disabled |
+1. Have the same problem even using the flag.
same issue here
+1 me to
+1, are there any solutions?
As https://github.com/pytorch/pytorch/issues/74824#issuecomment-1500144250 says, you can try to add your IP and __internal_head__
to /etc/hosts
, this works for me.
For example:
127.0.0.1 __internal_head__
As pytorch/pytorch#74824 (comment) says, you can try to add your IP and
__internal_head__
to/etc/hosts
, this works for me.For example:
127.0.0.1 __internal_head__
3Q, this works for me too!!! : )
export NCCL_IGNORE_DISABLED_P2P=1
didn't work for me.
Besides, to add the 127.0.0.1 __internal_head__
, I don't have sudo access to the /etc/hosts
file, unfortunately.
Is there any way to solve the problem?
Thank you very much in advance!
export NCCL_IGNORE_DISABLED_P2P=1
didn't work for me.Besides, to add the
127.0.0.1 __internal_head__
, I don't have sudo access to the/etc/hosts
file, unfortunately.Is there any way to solve the problem?
Thank you very much in advance!
@SuperBruceJia I have the same question,did you find it out?
export NCCL_IGNORE_DISABLED_P2P=1
didn't work for me. Besides, to add the127.0.0.1 __internal_head__
, I don't have sudo access to the/etc/hosts
file, unfortunately. Is there any way to solve the problem? Thank you very much in advance!@SuperBruceJia I have the same question,did you find it out?
I'm sorry to say that I haven't found a solution yet. For now, I'm utilizing only one GPU for inference.
You could try upgrading vllm to see if it resolves the issue, e.g., vllm-0.1.6 or vllm-0.2.0.
Good luck!
Best regards,
Shuyue
Dec. 30th, 2023
export NCCL_IGNORE_DISABLED_P2P=1
didn't work for me. Besides, to add the127.0.0.1 __internal_head__
, I don't have sudo access to the/etc/hosts
file, unfortunately. Is there any way to solve the problem? Thank you very much in advance!@SuperBruceJia I have the same question,did you find it out?
I'm sorry to say that I haven't found a solution yet. For now, I'm utilizing only one GPU for inference.
You could try upgrading vllm to see if it resolves the issue, e.g., vllm-0.1.6 or vllm-0.2.0.
Good luck!
Best regards,
Shuyue
Dec. 30th, 2023
0.2.7 It's not working
export NCCL_IGNORE_DISABLED_P2P=1
didn't work for me. Besides, to add the127.0.0.1 __internal_head__
, I don't have sudo access to the/etc/hosts
file, unfortunately. Is there any way to solve the problem? Thank you very much in advance!@SuperBruceJia I have the same question,did you find it out?
I'm sorry to say that I haven't found a solution yet. For now, I'm utilizing only one GPU for inference. You could try upgrading vllm to see if it resolves the issue, e.g., vllm-0.1.6 or vllm-0.2.0. Good luck! Best regards, Shuyue Dec. 30th, 2023
0.2.7 It's not working
I'm sorry to say that I haven't found a solution yet. I will let you know if I can figure it out.
Best regards,
Shuyue Jan. 15th, 2024
@SuperBruceJia @allendred you should be able to just use ~/.hosts
if you don't have sudo access for /etc/hosts
127.0.0.1 __internal_head__
I run this code
get errors:
I ref #570 this issue, export NCCL_IGNORE_DISABLED_P2P=1,and then wait for 8mins, run code, above error happen again Any help or guidance on how to resolve this issue would be greatly appreciated.
Thank you