I've fixed the problem that _mm512_extract_epi64 is not a supported instruction.
So use _mm512_extracti64x4_epi64 to split into upper and lower bits and use 256 for calculation.
If AVX512VPOPCNTDQ is supported, use _mm512_popcnt_epi64 without extract.
Bugfix of https://github.com/yahoojapan/NGT/issues/161
I've fixed the problem that _mm512_extract_epi64 is not a supported instruction. So use _mm512_extracti64x4_epi64 to split into upper and lower bits and use 256 for calculation. If AVX512VPOPCNTDQ is supported, use _mm512_popcnt_epi64 without extract.