Open anttimc opened 11 months ago
I managed to track down the issue with line_profiler
. The time goes to CPUID.get_raw_hz
CPUID.get_raw_hz
sleeps for one second to get a number of ticks!
https://github.com/workhorsy/py-cpuinfo/blob/4824ec0746be0dcee9bf9528dc8cdd0c1640cd9d/cpuinfo/cpuinfo.py#L1508C1-L1521C1
I think this kind of waiting time totally unacceptable. I wonder why I don't see this issue on Fedora? Probably something in the try-block fails before the sleep.
On fedora it probably stops at this condition where SELinux is checked
As cpuinfo
is a dependency of pytables
which is a depedency of pandas
for hdf5 file I/O, this issue can hit quite many users.
To remedy the issue, I think hz_actual
should be made optional in get_cpu_info
or there could be many methods to get only the info that the user needs. What do you think @workhorsy ?
A suggestion:
Refactor the info
https://github.com/workhorsy/py-cpuinfo/blob/4824ec0746be0dcee9bf9528dc8cdd0c1640cd9d/cpuinfo/cpuinfo.py#L1566C1-L1585C4
into functions that return groups of info dictionaries and call only those functions that are desired.
The groups could be for example raw_info
, hz_info
, cache_info
, basic_info
as they are now grouped in the info dict in the linked snippet. Or it could be even more fine-grained.
The desired groups could be passed as flags to the subprocesses and finally as a list of arguments to _get_cpu_info_from_cpuid_actual
. The default behaviour would be to return an info with all the groups.
Motivation: For example, in pytables only the cache info is needed to optimize file I/O.
Unsurprisingly also found on Linux Mint 21. 1 sec sleep for CPUID.get_raw_hz, which I don't need at all, is really annoying. I hope this will get changed!
Getting the cpu info on ubuntu takes over one second. I noticed this degradation in PyTables where the function is called at import time
Simple benchmark on ubuntu (in a docker container)
On fedora (the same machine and CPU)