lemire / FastPFor

The FastPFOR C++ library: Fast integer compression
Apache License 2.0
849 stars 124 forks source link

ARM64 build: it would be good at least to be buildable #80

Open klirichek opened 3 years ago

klirichek commented 3 years ago


I've tried to just build on Raspberry PI 4b, on Ubuntu 20.04

$ uname -a
Linux ubuntu 5.4.0-1032-raspi #35-Ubuntu SMP PREEMPT Fri Mar 19 20:52:40 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux
$ gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
$ cat /proc/cpuinfo
processor   : 0
BogoMIPS    : 108.00
Features    : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part    : 0xd08
CPU revision    : 3

Hardware    : BCM2835
Revision    : c03112
Serial      : 100000007c08a3a6
Model       : Raspberry Pi 4 Model B Rev 1.2

with simple make build && cd build && cmake .. then make

That tiny naive try immediately failed with absence of 'immintrin.h'

I've tried a little digg and replaced line from 'common.h' about immitrin.h to

//#include <immintrin.h>
#  if defined(__ARM_NEON)
#    include <arm_neon.h>
#  elif defined(__WINDOWS__) || defined(__WINRT__)
/* Visual Studio doesn't define __ARM_ARCH, but _M_ARM (if set, always 7), and _M_ARM64 (if set, always 1). */
#    if defined(_M_ARM)
#      include <armintr.h>
#      include <arm_neon.h>
#      define __ARM_NEON 1 /* Set __ARM_NEON so that it can be used elsewhere, at compile time */
#    endif
#    if defined (_M_ARM64)
#      include <arm64intr.h>
#      include <arm64_neon.h>
#      define __ARM_NEON 1 /* Set __ARM_NEON so that it can be used elsewhere, at compile time */
#    endif
#  endif

it also fails with huge bunch of errors about non-declared '_mm_storeu_si128', '_mm_loadu_si128', '__m128i', and many others.

So, there are some obvious questions from it:

  1. Is it possible to build on that platform? (AFAIK that is not only RPi, but new Mac on M1 also might be targeted, that is more serious!)
  2. If it is possible - is any instructions available?
  3. If it is not possible with hardware intrinsic - is any legacy way possible - let it works not so fast at all, as a pure 'stub', but at least would be buildable, so that consumers of the lib can also be usable?
lemire commented 3 years ago

Under "Hardware Requirements", we have 'We require an x64 platform.'

A port to ARM would be great. Pull request invited.

klirichek commented 2 years ago

I've successfully build using SIMDe (aka 'simd everywhere') library. It just wraps or (if impossible) implements most of simd intrinsics with native cpu capabilities (that is - map 1-1 on x64 with simde, map to NEON on arm64, and so on). Unfortunately that is not full port, as we use only subset of fastpfor in our project (so, I can make just 'partial' pr extracting this very patches). But I don't think the rest will be someway difficult.

JFY, our clone is manticoresoftware/fastpfor