pytorch / cpuinfo

CPU INFOrmation library (x86/x86-64/ARM/ARM64, Linux/Windows/Android/macOS/iOS)
BSD 2-Clause "Simplified" License
962 stars 306 forks source link

Fix RISC-V Linux build #212

Closed markdryan closed 6 months ago

markdryan commented 6 months ago

Cpuinfo was failing to build on RISC-V Linux distributions, e.g., Ubuntu 23.10, as it includes a header file sys/hwprobe.h that is not yet provided by glibc (although it is provided by bionic). We fix the issue by only including sys/hwprobe.h when building for Android, and invoking the hwprobe syscall directly on other Linux distributions. The Android specific check can be removed in the future once sys/hwprobe.h becomes available in glibc.

markdryan commented 6 months ago

@prashanthswami Could I ask you to verify that this patch works correctly on Android and that it doesn't change the hwprobe behaviour on Android builds in any way. It shouldn't do but I don't have a way to test this.

prashanthswami commented 6 months ago

Confirming - I can build this for Android RISC-V and verified that cpuinfo and isainfo both return properly. I also built against a toolchain that uses an older kernel header and it successfully built.

markdryan commented 6 months ago

Confirming - I can build this for Android RISC-V and verified that cpuinfo and isainfo both return properly. I also built against a toolchain that uses an older kernel header and it successfully built.

Great. Thanks. Could you also possibly confirm that we're still taking the bionic path on Android, e.g., that we're executing __riscv_hwprobe here?

prashanthswami commented 6 months ago

Yup - in the previous run, I had added some logs to ensure that Android was running through the hwprobe path to make sure we weren't falling back to the direct-syscall path.

markdryan commented 6 months ago

Yup - in the previous run, I had added some logs to ensure that Android was running through the hwprobe path to make sure we weren't falling back to the direct-syscall path.

Thanks!

markdryan commented 6 months ago

@malfet, @prashanthswami what is the process for getting this PR reviewed and merged? I'm a little concerned that if we leave it much longer pytorch itself might stop building on standard Linux distributions of RISC-V. This could happen if pytorch were to update its submodule to include #190 but not this patch. Pytorch seems to have updated its cpuinfo submodule recently, but luckily it was updated to point to the patch before #4e5be9e. The next update will break the RISC-V builds unless this patch is merged.

prashanthswami commented 6 months ago

@markdryan - there's some context in #210 but TL;DR this project doesn't have an official maintainer at the moment and malfet doesn't have the bandwidth or hardware setup to review patches. I wasn't able to get into the Slack so I've started a conversation on the PyTorch discussion forums here, mentioning this patch as something that's eventually going to break a future roll: https://discuss.pytorch.org/t/maintenance-of-pytorch-cpuinfo/194722/1

I'll defer to Nikita to comment on what or if we can do anything in the meanwhile (though I'm happy to lend help to validate any patches) - if possible I'd like to see if we can land this just to avoid the breakage and then pause any future work until we figure out how this project gets resourced with the PyTorch group.

malfet commented 6 months ago

@prashanthswami Can we start an email thread instead? Also, what error are you seeing when trying to join slack? @markdryan i'll try to review/validate PR today, but yes, we need to find a maintainer for the project (perhaps I should volunteer to do it for the first half of 2024, but I want to have a fallback)

prashanthswami commented 6 months ago

Ah sorry - "I wasn't able to get in" was a poor choice of words, I meant "I have not yet gotten an invitation link". I have the Google form confirmation of the request from the beginning of December, but haven't gotten feedback on a yes/no. Of course, joining the Slack is only for the purpose of the discussion thread.

I was redirected to post in https://dev-discuss.pytorch.org/t/maintenance-of-pytorch-cpuinfo/1767 - I'm happy to start an email thread if that's more appropriate though. What alias should be included from the PyTorch team?

malfet commented 6 months ago

@prashanthswami feel free to just email me (nshulga at meta) and I'll add the respective folks\ @markdryan my problem with this PR is that it:

markdryan commented 6 months ago
  • Essentially embeds headers/constants into the code rather than skip if they are not available

Unfortunately, skipping if the headers are not available is not really an option, as they are likely to be unavailable on build machines for the foreseeable future.  The sys/hwprobe.h header file does not yet exist on any glibc based Linux distributions as the glibc patches for hwprobe have not yet been merged.  This is why cpuinfo does not currently build on RISC-V Ubuntu and Fedora distributions.  On these distributions we could try including the kernel header file asm/hwprobe.h instead (which is the source of the constants in this PR), but there's another problem.  The asm/hwprobe.h file will only exist if the build machine is running a 6.4 kernel or greater.  As none of the RISC-V Ubuntu LTS releases have a 6.4 kernel (Ubuntu 22.04 has a 5.19 kernel) and cpuinfo binaries are likely to be built on the oldest distribution possible (x86_64 pytorch builds seem to be done on manylinux_2014 which is based on Centos 7 (which as far as I can tell has a 3.10 kernel)) it's unlikely that asm/hwprobe.h is going to be available at build time for a while, which would mean cpuinfo binaries would not be able to take advantage of hwprobe even when running on devices with the latest kernel, that support hwprobe.

There's also another complication, which admittedly is more of a problem for a future patch that uses hwprobe to detect extensions, but is worth mentioning in the context of this discussion.   The hwprobe API and headers files are being updated with each new kernel release.  Support for Zba, Zbb, Zbs an V was added in the 6.5 kernel.  Support for Zicboz was added in 6.7 and patches are pending to add support for many other extensions.  Each new kernel version adds new constants to the asm/hwprobe.h.  So let's say you build cpuinfo on a distribution with a 6.4 kernel.  The asm/hwprobe.h will be available but it won't contain the constants for Zba, Zbb, Zbs, V or Zicboz.  So if you want to use hwprobe to detect these extensions and you want your code to be buildable on a wide range of distributions or just older distributions, you have no option but to include the definitions for the constants in your code.

And it's worth pointing out here that if you want to detect things like Zba, Zbb, Zbs and Zicboz and other multi-letter extensions, of which there are many, you have no option but to use hwprobe.  You cannot simply fall back to hwcap if the hwprobe header files are not available, as hwcap cannot detect multi-letter extensions.

For all these reasons other projects that use hwprobe tend to duplicate the kernel constants in their own code.

Also note that there is a precedent for including constants from kernel header files in the cpuinfo code.  See

In summary then

  • This also somewhat non-cmake friendly, as more conventional way of doing something like that would be to detect if header is present and use it during the build if it is

Right, I wasn't sure about this bit. Using __has_include seemed to be the simplest solution. I'll update the patch to use CMake to check for the hwprobe headers.