pytorch / cpuinfo

CPU INFOrmation library (x86/x86-64/ARM/ARM64, Linux/Windows/Android/macOS/iOS)
BSD 2-Clause "Simplified" License
997 stars 314 forks source link

add FreeBSD support #172

Closed cyyever closed 4 months ago

yurivict commented 1 year ago

The testcase below fails with a "Floating point exception" with this patch on FreeBSD 13.2.

import torch
import math

dtype = torch.float
device = "cuda" if torch.cuda.is_available() else "cpu"
torch.set_default_device(device)

# Create Tensors to hold input and outputs.
# By default, requires_grad=False, which indicates that we do not need to
# compute gradients with respect to these Tensors during the backward pass.
x = torch.linspace(-math.pi, math.pi, 2000, dtype=dtype)
y = torch.sin(x)

# Create random Tensors for weights. For a third order polynomial, we need
# 4 weights: y = a + b x + c x^2 + d x^3
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Tensors during the backward pass.
a = torch.randn((), dtype=dtype, requires_grad=True)
b = torch.randn((), dtype=dtype, requires_grad=True)
c = torch.randn((), dtype=dtype, requires_grad=True)
d = torch.randn((), dtype=dtype, requires_grad=True)

learning_rate = 1e-6
for t in range(2000):
    # Forward pass: compute predicted y using operations on Tensors.
    y_pred = a + b * x + c * x ** 2 + d * x ** 3

    # Compute and print loss using operations on Tensors.
    # Now loss is a Tensor of shape (1,)
    # loss.item() gets the scalar value held in the loss.
    loss = (y_pred - y).pow(2).sum()
    if t % 100 == 99:
        print(t, loss.item())

    # Use autograd to compute the backward pass. This call will compute the
    # gradient of loss with respect to all Tensors with requires_grad=True.
    # After this call a.grad, b.grad. c.grad and d.grad will be Tensors holding
    # the gradient of the loss with respect to a, b, c, d respectively.
    loss.backward()

    # Manually update weights using gradient descent. Wrap in torch.no_grad()
    # because weights have requires_grad=True, but we don't need to track this
    # in autograd.
    with torch.no_grad():
        a -= learning_rate * a.grad
        b -= learning_rate * b.grad
        c -= learning_rate * c.grad
        d -= learning_rate * d.grad

        # Manually zero the gradients after updating weights
        a.grad = None
        b.grad = None
        c.grad = None
        d.grad = None

print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')
cyyever commented 1 year ago

@yurivict My result is

python3 a.py
99 1283.42333984375
199 853.2401123046875
299 568.3171997070312
399 379.5828552246094
499 254.54953002929688
599 171.7064666748047
699 116.80989074707031
799 80.42716217041016
899 56.31061553955078
999 40.32223129272461
1099 29.720619201660156
1199 22.68964195251465
1299 18.02572250366211
1399 14.931407928466797
1499 12.87797737121582
1599 11.514902114868164
1699 10.609926223754883
1799 10.008904457092285
1899 9.609664916992188
1999 9.344354629516602
Result: y = 0.0073340958915650845 + 0.8354617357254028 x + -0.0012652541045099497 x^2 + -0.09030362218618393 x^3

Can you try to reproduce on main branch of pytorch with cpuinfo replaced with my version. Some modifications of main are required, just try my fork https://github.com/cyyever/pytorch/tree/freebsd. No floating point computations in this PR, just system calls. The error may be due to corrupted pytorch code, you can recompile everything from source and retry.

yurivict commented 1 year ago

I tried with the recent release 2.0.1

cyyever commented 1 year ago

I tried with the recent release 2.0.1

Recompile with pytorch 2.0.1 should work, and debug your code with valgrind, paste the result here. Replace cpuinfo from torch/third_party, make sure pytorch links with my version, and retry. It is also possible that some bug in release 2.0.1 was fixed recently

yurivict commented 1 year ago

Here is what cgdb screen looks like at the moment of failure:

 79│                 packages[i].processor_start = i * threads_per_package;
 80│                 packages[i].processor_count = threads_per_package;
 81│                 packages[i].core_start = i * cores_per_package;
 82│                 packages[i].core_count = cores_per_package;
 83│                 packages[i].cluster_start = i;
 84│                 packages[i].cluster_count = 1;
 85│                 cpuinfo_x86_format_package_name(x86_processor.vendor, brand_string, packages[i].name);
 86│         }
 87│         for (uint32_t i = 0; i < freebsd_topology.cores; i++) {
 88│                 cores[i] = (struct cpuinfo_core) {
 89│                         .processor_start = i * threads_per_core,
 90│                         .processor_count = threads_per_core,
 91│                         .core_id = i % cores_per_package,
 92├───────────────────────> .cluster = clusters + i / cores_per_package,
 93│                         .package = packages + i / cores_per_package,
 94│                         .vendor = x86_processor.vendor,
 95│                         .uarch = x86_processor.uarch,
 96│                         .cpuid = x86_processor.cpuid,
 97│                 };
 98│         }
 99│         for (uint32_t i = 0; i < freebsd_topology.threads; i++) {
100│                 const uint32_t smt_id = i % threads_per_core;
101│                 const uint32_t core_id = i / threads_per_core;
102│                 const uint32_t package_id = i / threads_per_package;
103│
104│                 /* Reconstruct APIC IDs from topology components */
/disk-samsung/pytorch-work/pytorch-v2.0.1/third_party/cpuinfo/src/x86/freebsd/init.c                                                                                                           
New UI allocated
(gdb) r x2.py 
Starting program: /usr/local/bin/python3.9 x2.py
warning: File "/usr/local/lib/libpython3.9.so.1.0-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
        add-auto-load-safe-path /usr/local/lib/libpython3.9.so.1.0-gdb.py
line to your configuration file "/home/yuri/.config/gdb/gdbinit".
To completely disable this security protection add
        set auto-load safe-path /
line to your configuration file "/home/yuri/.config/gdb/gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
        info "(gdb)Auto-loading safe path"

Program received signal SIGFPE, Arithmetic exception.
Integer divide by zero.
0x0000000808f87fa4 in cpuinfo_x86_freebsd_init () at /disk-samsung/pytorch-work/pytorch-v2.0.1/third_party/cpuinfo/src/x86/freebsd/init.c:92
92                              .cluster = clusters + i / cores_per_package,
(gdb) p cores_per_package 
$1 = 0
(gdb) p &cores_per_package
Address requested for identifier "cores_per_package" which is in register $r12
(gdb) p clusters
$2 = (struct cpuinfo_cluster *) 0x853e416c0
(gdb) p i
$4 = 0
(gdb) p cores_per_package 
$5 = 0
(gdb) 
yurivict commented 1 year ago

cores_per_package is zero which is wrong.

yurivict commented 1 year ago

Unfortunately, freebsd_topology isn't printed by the debugger because it is somehow "optimized out".

cyyever commented 1 year ago

@yurivict I could reproduce the error in another host and it is fixed.

yurivict commented 1 year ago

Same Floating point exception with patches cb647773be54f308a5836ebd65a1c5ec2bea46c2 and 7948b28fd62277db51758fc1151efcc562851ea6.

cyyever commented 1 year ago

Same Floating point exception with patches cb647773be54f308a5836ebd65a1c5ec2bea46c2 and 7948b28fd62277db51758fc1151efcc562851ea6. Print the output of sysctl kern.sched.topology_spec

yurivict commented 1 year ago
$ sysctl  kern.sched.topology_spec
kern.sched.topology_spec: <groups>
 <group level="1" cache-level="3">
  <cpu count="8" mask="ff,0,0,0">0, 1, 2, 3, 4, 5, 6, 7</cpu>
  <children>
   <group level="2" cache-level="2">
    <cpu count="2" mask="3,0,0,0">0, 1</cpu>
    <flags><flag name="THREAD">THREAD group</flag><flag name="SMT">SMT group</flag></flags>
   </group>
   <group level="2" cache-level="2">
    <cpu count="2" mask="c,0,0,0">2, 3</cpu>
    <flags><flag name="THREAD">THREAD group</flag><flag name="SMT">SMT group</flag></flags>
   </group>
   <group level="2" cache-level="2">
    <cpu count="2" mask="30,0,0,0">4, 5</cpu>
    <flags><flag name="THREAD">THREAD group</flag><flag name="SMT">SMT group</flag></flags>
   </group>
   <group level="2" cache-level="2">
    <cpu count="2" mask="c0,0,0,0">6, 7</cpu>
    <flags><flag name="THREAD">THREAD group</flag><flag name="SMT">SMT group</flag></flags>
   </group>
  </children>
 </group>
</groups>
cyyever commented 1 year ago

@yurivict The package number should be recognized as 1. Can you have a debug build and give me the traceback by valgrind or lldb?

yurivict commented 1 year ago

"packages" comes from the sysctl value "kern.smp.cpus" which has the value 8. "cores" comes from the sysctl value "kern.smp.cores" which has the value 4. cores_per_package=freebsd_topology.cores / freebsd_topology.packages which makes it to be zero.

cyyever commented 1 year ago

"packages" comes from the sysctl value "kern.smp.cpus" which has the value 8. "cores" comes from the sysctl value "kern.smp.cores" which has the value 4. cores_per_package=freebsd_topology.cores / freebsd_topology.packages which makes it to be zero.

The lastest code counts "packages" from "kern.sched.topology_spec" , which should be 1 in your host.

yurivict commented 1 year ago

It's not what I have after 5 patches:

struct cpuinfo_freebsd_topology cpuinfo_freebsd_detect_topology(void) {
        int packages = cpuinfo_from_freebsd_sysctl("kern.smp.cpus");
        int cores = cpuinfo_from_freebsd_sysctl("kern.smp.cores");
        int threads_per_core = cpuinfo_from_freebsd_sysctl("kern.smp.threads_per_core");
        cpuinfo_log_debug("freebsd topology: packages = %d, cores = %d, threads_per_core = %d", packages, cores, threads_per_core);
        struct cpuinfo_freebsd_topology topology = {
                .packages = (uint32_t) packages,
                .cores = (uint32_t) cores,
                .threads_per_core = (uint32_t) threads_per_core,
                .threads = (uint32_t) (threads_per_core * cores)
        };

        return topology;
}

Maybe you need to squash the commits.

cyyever commented 1 year ago

@yurivict The history is somewhat messy. Can you remove the local repository and re-clone? I would squash until it is stable

yurivict commented 1 year ago

Sorry, I had not all patches applied before. The current set of patches works fine on my machine.

cyyever commented 1 year ago

@yurivict Grad to see that. What is your CPU utility? Can all core reach 100%?

yurivict commented 1 year ago

This example prints that torch.get_num_threads() is 4 for some reason.

cyyever commented 1 year ago

@yurivict

num_threads /= 2;   

in Pytorch c10/core/thread_pool.h line 42 so you have half of cores. No comment about the divided by 2 logic, and cpuinfo is not used.

yurivict commented 6 months ago

Hi @cyyever

I updated the same PR so that it merges with the latest cpuinfo revision: https://github.com/pytorch/cpuinfo/pull/230

I verified that it works with OpenAI Whisper project, and many simple testcases.

Could you please merge it?

Thank you, Yuri

cyyever commented 6 months ago

@Maratyszcza Most of listed issues have been fixed except the cores, which is returned by FreeeBSD, can you help review again?

Maratyszcza commented 6 months ago

I no longer maintain this project, maybe @malfet or @fbarchard could review?

cyyever commented 4 months ago

@fbarchard Help merge it?

cyyever commented 4 months ago

@malfet Help merge it?

amitdo commented 3 months ago

The README.md file should mention FreeBSD support.

cyyever commented 2 months ago

The README.md file should mention FreeBSD support.

The support is still experimental.

amitdo commented 2 months ago

The support is still experimental.

I still think you should publish it. Just mention that it is experimental.