quiver-team / torch-quiver

PyTorch Library for Low-Latency, High-Throughput Graph Learning on GPUs.
https://torch-quiver.readthedocs.io/en/latest/
Apache License 2.0
293 stars 36 forks source link

os.sched_setaffinity(0, [2*(cpu_offset+rank)]) OSError: [Errno 22] Invalid argument #162

Closed LukeLIN-web closed 1 year ago

LukeLIN-web commented 1 year ago

When I run torch-quiver/srcs/python/quiver/serving.py It shows

Process Process-29:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/root/share/torch-quiver/srcs/python/quiver/serving.py", line 120, in cpu_sampler_worker_loop
    os.sched_setaffinity(0, [2*(cpu_offset+rank)])
OSError: [Errno 22] Invalid argument

The source code is

    def cpu_sampler_worker_loop(self, rank, sample_task_queue_list, result_queue_list, device_num, sizes, csr_topo, cpu_offset):
        os.sched_setaffinity(0, [2*(cpu_offset+rank)])

I have only 16 CPU

lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              16
On-line CPU(s) list: 0-15
CongjieHe commented 1 year ago

This bug has been fixed in #164. We corrected the calculation method for the CPU range in reddit_serving.py and provided a cpu_range parameter for easy customization of the process-to-CPU binding mapping.

LukeLIN-web commented 1 year ago

This bug has been fixed in #164. We corrected the calculation method for the CPU range in reddit_serving.py and provided a cpu_range parameter for easy customization of the process-to-CPU binding mapping.

It's great work! Thank you for your time!