pytorch / FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Other
1.19k stars 490 forks source link

Having issue installing FBGEMM-gpu on MacOS #2248

Open justin8shan opened 9 months ago

justin8shan commented 9 months ago

Hello, I have problem to install FBGEMM-gpu for torchREC

Initially I tried to install using pip but not successful.

(torchrec) ➜  torchrec git:(main) ✗ pip install fbgemm-gpu --index-url https://download.pytorch.org/whl/cpu

Looking in indexes: https://download.pytorch.org/whl/cpu
Requirement already satisfied: pip in /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages (23.3.1)
ERROR: Could not find a version that satisfies the requirement install (from versions: none)
ERROR: No matching distribution found for install

Then, tried to build the package by following instruction on https://pytorch.org/FBGEMM/general/BuildInstructions.html#fbgemm-gpu-docs-build-setup-tools-install to install cpu-only FBGEMM-gpu. Executed below command

 python setup.py bdist_wheel \
    --package_name="${package_name}" \
    --package_variant=cpu \
    --python-tag="${python_tag}" \
    --plat-name="-13.6-${ARCH}"

I received Clang error:

[SETUP.PY] Parsed Arguments: Namespace(verbose=False, package_variant='cpu', package_name='fbgemm_gpu_cpu', nvml_lib_path=None)
[SETUP.PY] Unknown Arguments: ['bdist_wheel', '--python-tag=py310', '--plat-name=-13.6-x86_64']
[SETUP.PY] Extracted the package name: 'fbgemm_gpu_cpu'
[SETUP.PY] Not building FBGEMM_GPU from Nova.
[SETUP.PY] Extracted the package variant+version: ''
[SETUP.PY] Generating the package version ...
[SETUP.PY] Package is for RELEASE: using git info for the versioning
[SETUP.PY] TAG: v0.6.0-rc0, BRANCH: main, SHA: 441697c0481f82b9c328d39f70e4b34fdc890758
[SETUP.PY] Setting the full package version string: 0.6.0rc0.post26
[SETUP.PY] Generating version file at: /Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/fbgemm_gpu/docs/version.py
[SETUP.PY] Building the CPU-ONLY variant of FBGEMM_GPU ...
-13.6-x86_64
[1/52] Building CXX object CMakeFiles/fbgemm_gpu_py.dir/codegen/embedding_forward_quantized_host_cpu.cpp.o
FAILED: CMakeFiles/fbgemm_gpu_py.dir/codegen/embedding_forward_quantized_host_cpu.cpp.o
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_py_EXPORTS -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/include -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/../include -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/../third_party/asmjit/src -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/../third_party/cpuinfo/include -isystem /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include -isystem /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -D_GLIBCXX_USE_CXX11_ABI=0 -DNO_AVX512=1 -O3 -DNDEBUG -std=c++17 -arch x86_64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk -mmacosx-version-min=13.6 -fPIC -mavx2 -mf16c -mfma -fopenmp -MD -MT CMakeFiles/fbgemm_gpu_py.dir/codegen/embedding_forward_quantized_host_cpu.cpp.o -MF CMakeFiles/fbgemm_gpu_py.dir/codegen/embedding_forward_quantized_host_cpu.cpp.o.d -o CMakeFiles/fbgemm_gpu_py.dir/codegen/embedding_forward_quantized_host_cpu.cpp.o -c /Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/codegen/embedding_forward_quantized_host_cpu.cpp
clang: error: unsupported option '-fopenmp'

Any advice how to solve this issue

MasOS: Ventura Processor: Intel Core i7

excelle08 commented 9 months ago

Hi, FBGEMM-GPU currently needs to be built with GCC and does not support clang. Can you try installing gcc and g++ and running the setup again?

justin8shan commented 9 months ago

I did. I had gcc installed with brew and exported CC/CXX. but still have the same problem

(base) ➜  gitrepo export CC=/usr/local/Cellar/gcc/13.2.0/bin/gcc-13
(base) ➜  gitrepo export CXX=/usr/local/Cellar/gcc/13.2.0/bin/g++-13
q10 commented 9 months ago

Hi @justin8shan, we currently do not officially support FBGEMM_GPU-CPU on MacOS, so there is no guarantee that the code will build. That being said, you might not be passing in CC and CXX correctly - could you try:

 python setup.py bdist_wheel \
    --package_name="${package_name}" \
    --package_variant=cpu \
    --python-tag="${python_tag}" \
    --plat-name="macosx-13.6-${ARCH}" \
    -DCMAKE_C_COMPILER="/usr/local/Cellar/gcc/13.2.0/bin/gcc-13" \
    -DCMAKE_CXX_COMPILER="/usr/local/Cellar/gcc/13.2.0/bin/g++-13"

and show us the full build logs?

justin8shan commented 9 months ago

@q10 Thanks for the advice. Is there a plan to support MacOS deployment for local build?

With your suggestion, the above error is gone but I got other error saying libtorch missing. Then I downloaded libtorch-macos-latest.zip and copied to /usr/local/Cellar/libtorch. After running again, I got more other errors as below. Can you help?

Here is the build log.

(torchrec) ➜  fbgemm_gpu git:(main) ✗ python setup.py bdist_wheel \
    --package_name="${package_name}" \
    --package_variant=cpu \
    --python-tag="${python_tag}" \
    --plat-name="macosx-13.6-${ARCH}" \
  -DCMAKE_C_COMPILER=/usr/local/Cellar/gcc/13.2.0/bin/gcc-13 \
    -DCMAKE_CXX_COMPILER=/usr/local/Cellar/gcc/13.2.0/bin/g++-13
['setup.py', 'bdist_wheel', '--package_name=fbgemm_gpu_cpu', '--package_variant=cpu', '--python-tag=py310', '--plat-name=macosx-13.6-x86_64', '-DCMAKE_C_COMPILER=/usr/local/Cellar/gcc/13.2.0/bin/gcc-13', '-DCMAKE_CXX_COMPILER=/usr/local/Cellar/gcc/13.2.0/bin/g++-13']
[SETUP.PY] Parsed Arguments: Namespace(verbose=False, package_variant='cpu', package_name='fbgemm_gpu_cpu', nvml_lib_path=None)
[SETUP.PY] Unknown Arguments: ['bdist_wheel', '--python-tag=py310', '--plat-name=macosx-13.6-x86_64', '-DCMAKE_C_COMPILER=/usr/local/Cellar/gcc/13.2.0/bin/gcc-13', '-DCMAKE_CXX_COMPILER=/usr/local/Cellar/gcc/13.2.0/bin/g++-13']
[SETUP.PY] Extracted the package name: 'fbgemm_gpu_cpu'
[SETUP.PY] Not building FBGEMM_GPU from Nova.
[SETUP.PY] Extracted the package variant+version: ''
[SETUP.PY] Generating the package version ...
[SETUP.PY] Package is for RELEASE: using git info for the versioning
[SETUP.PY] TAG: v0.6.0-rc0, BRANCH: main, SHA: 441697c0481f82b9c328d39f70e4b34fdc890758
[SETUP.PY] Setting the full package version string: 0.6.0rc0.post26
[SETUP.PY] Generating version file at: /Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/fbgemm_gpu/docs/version.py
[SETUP.PY] Building the CPU-ONLY variant of FBGEMM_GPU ...
macosx-13.6-x86_64
[4/35] Building CXX object CMakeFiles/fbgemm_gpu_py.dir/src/sparse_ops/sparse_ops_cpu.cpp.o
FAILED: CMakeFiles/fbgemm_gpu_py.dir/src/sparse_ops/sparse_ops_cpu.cpp.o
/usr/local/Cellar/gcc/13.2.0/bin/g++-13 -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_py_EXPORTS -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/include -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/../include -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/../third_party/asmjit/src -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/../third_party/cpuinfo/include -isystem /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include -isystem /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -D_GLIBCXX_USE_CXX11_ABI=0 -DNO_AVX512=1 -O3 -DNDEBUG -std=c++17 -arch x86_64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk -mmacosx-version-min=13.6 -fPIC -mavx2 -mf16c -mfma -fopenmp -MD -MT CMakeFiles/fbgemm_gpu_py.dir/src/sparse_ops/sparse_ops_cpu.cpp.o -MF CMakeFiles/fbgemm_gpu_py.dir/src/sparse_ops/sparse_ops_cpu.cpp.o.d -o CMakeFiles/fbgemm_gpu_py.dir/src/sparse_ops/sparse_ops_cpu.cpp.o -c /Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, LogitType*, int64_t*) [with LogitType = double; SegmentValueType = int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1803:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: error: no matching function for call to 'max(long int, long long int)'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/local/Cellar/gcc/13.2.0/include/c++/13/algorithm:60,
                 from /Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:9:
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/local/Cellar/gcc/13.2.0/include/c++/13/algorithm:61:
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, LogitType*, int64_t*) [with LogitType = double; SegmentValueType = long long int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1803:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: error: no matching function for call to 'max(long int, long long int)'
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, LogitType*, int64_t*) [with LogitType = float; SegmentValueType = int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1803:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: error: no matching function for call to 'max(long int, long long int)'
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, LogitType*, int64_t*) [with LogitType = float; SegmentValueType = long long int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1803:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: error: no matching function for call to 'max(long int, long long int)'
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, LogitType*, int64_t*) [with LogitType = c10::Half; SegmentValueType = int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1803:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: error: no matching function for call to 'max(long int, long long int)'
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, LogitType*, int64_t*) [with LogitType = c10::Half; SegmentValueType = long long int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1803:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: error: no matching function for call to 'max(long int, long long int)'
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, LogitType*, int64_t*) [with LogitType = c10::BFloat16; SegmentValueType = int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1803:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: error: no matching function for call to 'max(long int, long long int)'
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, LogitType*, int64_t*) [with LogitType = c10::BFloat16; SegmentValueType = long long int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1803:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: error: no matching function for call to 'max(long int, long long int)'
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_generic_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, const double*, LogitType*, int64_t*) [with LogitType = double; SegmentValueType = int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1922:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: error: no matching function for call to 'max(long int, long long int)'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_generic_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, const double*, LogitType*, int64_t*) [with LogitType = double; SegmentValueType = long long int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1922:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: error: no matching function for call to 'max(long int, long long int)'
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_generic_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, const double*, LogitType*, int64_t*) [with LogitType = float; SegmentValueType = int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1922:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: error: no matching function for call to 'max(long int, long long int)'
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_generic_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, const double*, LogitType*, int64_t*) [with LogitType = float; SegmentValueType = long long int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1922:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: error: no matching function for call to 'max(long int, long long int)'
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_generic_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, const double*, LogitType*, int64_t*) [with LogitType = c10::Half; SegmentValueType = int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1922:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: error: no matching function for call to 'max(long int, long long int)'
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_generic_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, const double*, LogitType*, int64_t*) [with LogitType = c10::Half; SegmentValueType = long long int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1922:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: error: no matching function for call to 'max(long int, long long int)'
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_generic_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, const double*, LogitType*, int64_t*) [with LogitType = c10::BFloat16; SegmentValueType = int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1922:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: error: no matching function for call to 'max(long int, long long int)'
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_generic_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, const double*, LogitType*, int64_t*) [with LogitType = c10::BFloat16; SegmentValueType = long long int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1922:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: error: no matching function for call to 'max(long int, long long int)'
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[5/35] Building CXX object CMakeFiles/fbgemm_gpu_py.dir/codegen/embedding_bounds_check_host_cpu.cpp.o
FAILED: CMakeFiles/fbgemm_gpu_py.dir/codegen/embedding_bounds_check_host_cpu.cpp.o
/usr/local/Cellar/gcc/13.2.0/bin/g++-13 -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_py_EXPORTS -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/include -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/../include -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/../third_party/asmjit/src -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/../third_party/cpuinfo/include -isystem /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include -isystem /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -D_GLIBCXX_USE_CXX11_ABI=0 -DNO_AVX512=1 -O3 -DNDEBUG -std=c++17 -arch x86_64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk -mmacosx-version-min=13.6 -fPIC -mavx2 -mf16c -mfma -fopenmp -MD -MT CMakeFiles/fbgemm_gpu_py.dir/codegen/embedding_bounds_check_host_cpu.cpp.o -MF CMakeFiles/fbgemm_gpu_py.dir/codegen/embedding_bounds_check_host_cpu.cpp.o.d -o CMakeFiles/fbgemm_gpu_py.dir/codegen/embedding_bounds_check_host_cpu.cpp.o -c /Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/codegen/embedding_bounds_check_host_cpu.cpp
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/codegen/embedding_bounds_check_host_cpu.cpp: In function 'void {anonymous}::adjust_offset_cpu(index_t&, index_t&, int64_t, index_t*, index_t*)':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/codegen/embedding_bounds_check_host_cpu.cpp:33:15: error: no matching function for call to 'max(long int, const long long int&)'
   33 |       std::max(0L, std::min(static_cast<int64_t>(indices_start), num_indices));
      |       ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/local/Cellar/gcc/13.2.0/include/c++/13/deque:62,
                 from /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include/ATen/core/Generator.h:4,
                 from /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include/ATen/CPUGeneratorImpl.h:3,
                 from /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include/ATen/Context.h:3,
                 from /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include/ATen/ATen.h:7,
                 from /Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/codegen/embedding_bounds_check_host_cpu.cpp:9:
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/codegen/embedding_bounds_check_host_cpu.cpp:33:15: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
   33 |       std::max(0L, std::min(static_cast<int64_t>(indices_start), num_indices));
      |       ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/codegen/embedding_bounds_check_host_cpu.cpp:33:15: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
   33 |       std::max(0L, std::min(static_cast<int64_t>(indices_start), num_indices));
      |       ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/local/Cellar/gcc/13.2.0/include/c++/13/functional:67,
                 from /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include/c10/util/C++17.h:7,
                 from /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include/c10/util/string_view.h:4,
                 from /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include/c10/util/StringUtil.h:6,
                 from /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include/c10/util/Exception.h:5,
                 from /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include/ATen/core/Generator.h:11:
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/codegen/embedding_bounds_check_host_cpu.cpp:33:15: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
   33 |       std::max(0L, std::min(static_cast<int64_t>(indices_start), num_indices));
      |       ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/codegen/embedding_bounds_check_host_cpu.cpp:33:15: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
   33 |       std::max(0L, std::min(static_cast<int64_t>(indices_start), num_indices));
      |       ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[6/35] Building CXX object CMakeFiles/fbgemm_gpu_py.dir/src/jagged_tensor_ops/jagged_tensor_ops_cpu.cpp.o
FAILED: CMakeFiles/fbgemm_gpu_py.dir/src/jagged_tensor_ops/jagged_tensor_ops_cpu.cpp.o
/usr/local/Cellar/gcc/13.2.0/bin/g++-13 -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_py_EXPORTS -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/include -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/../include -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/../third_party/asmjit/src -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/../third_party/cpuinfo/include -isystem /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include -isystem /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -D_GLIBCXX_USE_CXX11_ABI=0 -DNO_AVX512=1 -O3 -DNDEBUG -std=c++17 -arch x86_64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk -mmacosx-version-min=13.6 -fPIC -mavx2 -mf16c -mfma -fopenmp -MD -MT CMakeFiles/fbgemm_gpu_py.dir/src/jagged_tensor_ops/jagged_tensor_ops_cpu.cpp.o -MF CMakeFiles/fbgemm_gpu_py.dir/src/jagged_tensor_ops/jagged_tensor_ops_cpu.cpp.o.d -o CMakeFiles/fbgemm_gpu_py.dir/src/jagged_tensor_ops/jagged_tensor_ops_cpu.cpp.o -c /Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/jagged_tensor_ops/jagged_tensor_ops_cpu.cpp
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/jagged_tensor_ops/jagged_tensor_ops_cpu.cpp: In function 'void TORCH_LIBRARY_FRAGMENT_init_fbgemm_2(torch::Library&)':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/jagged_tensor_ops/jagged_tensor_ops_cpu.cpp:1638:5: error: 'class torch::Library' has no member named 'impl_abstract_pystub'
 1638 |   m.impl_abstract_pystub(
      |     ^~~~~~~~~~~~~~~~~~~~
[9/35] Building CXX object CMakeFiles/fbgemm_gpu_py.dir/codegen/batch_index_select_dim0_cpu_host.cpp.o
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/skbuild/setuptools_wrap.py", line 674, in setup
    cmkr.make(make_args, install_target=cmake_install_target, env=env)
  File "/Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/skbuild/cmaker.py", line 697, in make
    self.make_impl(clargs=clargs, config=config, source_dir=source_dir, install_target=install_target, env=env)
  File "/Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/skbuild/cmaker.py", line 742, in make_impl
    raise SKBuildError(msg)

An error occurred while building with CMake.
  Command:
    /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/cmake/data/bin/cmake --build . --target install --config Release --
  Install target:
    install
  Source directory:
    /Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu
  Working directory:
    /Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/_skbuild/macosx-13.6-x86_64-3.10/cmake-build
Please check the install target is valid and see CMake's output for more information.
q10 commented 9 months ago

@justin8shan There may be issues with gcc 12+ similar to this one, could you try installing a lower version of gcc and see if the errors reproduce?

justin8shan commented 9 months ago

@justin8shan There may be issues with gcc 12+ similar to this one, could you try installing a lower version of gcc and see if the errors reproduce?

Unfortunately I tried all version from gcc 8 to 13 and got same error.

q10 commented 9 months ago

@justin8shan Could you show us the instructions you ran (e.g. specific versions of software installed, etc) to build fbgemm_gpu on the Mac? We will use it as reference and add it into our documentation.

The invokers file is actually autogenerated from a template file during the build process, so I imagine that something is still not quite correct in the build...

justin8shan commented 9 months ago

@q10 Sorry it was a false statement that I was able to build package as I did that by removing CMakeList.txt file. Once I build with existing CMakeList.txt file. I still get similar error as above

I manually fixed the mismatch type issue by forcing type conversion like below

Before :
      std::max(0L, std::min(static_cast<int64_t>(indices_start), num_indices));

After:
      std::max(0L, long(std::min(static_cast<int64_t>(indices_start), num_indices)));

After build again I got issue at the final step of compiling, do you know how to fix it?

(torchrec) ➜  fbgemm_gpu git:(v0.5.0-release) ✗ python setup.py bdist_wheel \
    --package_name="${package_name}" \
    --package_variant=cpu \
    --python-tag="${python_tag}" \
    --plat-name="macos-13.6-${ARCH}" \
    -DCMAKE_C_COMPILER="/usr/local/Cellar/gcc/13.2.0/bin/gcc-13" \
    -DCMAKE_CXX_COMPILER="/usr/local/Cellar/gcc/13.2.0/bin/g++-13"
['setup.py', 'bdist_wheel', '--package_name=fbgemm_gpu_cpu', '--package_variant=cpu', '--python-tag=py310', '--plat-name=macos-13.6-x86_64', '-DCMAKE_C_COMPILER=/usr/local/Cellar/gcc/13.2.0/bin/gcc-13', '-DCMAKE_CXX_COMPILER=/usr/local/Cellar/gcc/13.2.0/bin/g++-13']
[SETUP.PY] Parsed Arguments: Namespace(package_variant='cpu', package_name='fbgemm_gpu_cpu', nvml_lib_path=None)
[SETUP.PY] Unknown Arguments: ['bdist_wheel', '--python-tag=py310', '--plat-name=macos-13.6-x86_64', '-DCMAKE_C_COMPILER=/usr/local/Cellar/gcc/13.2.0/bin/gcc-13', '-DCMAKE_CXX_COMPILER=/usr/local/Cellar/gcc/13.2.0/bin/g++-13']
[SETUP.PY] Extracted the package name: 'fbgemm_gpu_cpu'
[SETUP.PY] Not building FBGEMM_GPU from Nova.
[SETUP.PY] Extracted the package variant+version: ''
[SETUP.PY] Generating the package version ...
[SETUP.PY] Package is for RELEASE: using git info for the versioning
[SETUP.PY] TAG: v0.5.0, BRANCH: v0.5.0-release, SHA: b6ed54a2ec9757a159d5a4aec7c8a9b16c16c222
[SETUP.PY] Setting the full package version string: 0.5.0
[SETUP.PY] Generating version file at: /Users/xshan/pubrepo/fbgemm/fbgemm_gpu/fbgemm_gpu/_fbgemm_gpu_version.py
macos-13.6-x86_64
[0/1] Re-running CMake...
================================================================================
Building the CPU-only variant of FBGEMM-GPU
================================================================================

================================================================================
Default C++ compiler flags
(values may be overridden by CMAKE_CXX_STANDARD and CXX_STANDARD):

 -D_GLIBCXX_USE_CXX11_ABI=0
================================================================================

================================================================================
The project is built using scikit-build
================================================================================

CMake Warning at /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
  static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
  /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
  CMakeLists.txt:113 (find_package)

-- Configuring done (0.1s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/xshan/pubrepo/fbgemm/fbgemm_gpu/_skbuild/macosx-13.6-x86_64-3.10/cmake-build
[2/111] Generating gen_embedding_forward_quantized_unweighted_co...okup_approx_rowwise_adagrad_with_weight_decay.py, lookup_none.py
[Backward Split] [dense]: gen_embedding_backward_dense_split_weighted_cuda.cu
[Backward Split] [dense]: gen_embedding_backward_dense_split_unweighted_nobag_cuda.cu
[Backward Split] [dense]: gen_embedding_backward_dense_split_unweighted_cuda.cu
[Backward Split] [dense]: gen_embedding_backward_dense_split_weighted_kernel_cta.cu
[Backward Split] [dense]: gen_embedding_backward_dense_split_unweighted_nobag_kernel_cta.cu
[Backward Split] [dense]: gen_embedding_backward_dense_split_unweighted_kernel_cta.cu
[Backward Split] [dense]: gen_embedding_backward_dense_split_weighted_kernel_warp.cu
[Backward Split] [dense]: gen_embedding_backward_dense_split_unweighted_nobag_kernel_warp.cu
[Backward Split] [dense]: gen_embedding_backward_dense_split_unweighted_kernel_warp.cu
[Backward Split] [dense]: gen_embedding_backward_dense_split_cpu.cpp
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_weighted_fp32_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_weighted_fp16_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_weighted_fp8_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_weighted_int8_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_weighted_int4_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_weighted_int2_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_fp32_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_fp16_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_fp8_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_int8_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_int4_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_int2_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_fp32_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_fp16_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_fp8_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_int8_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_int4_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_int2_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_host_weighted_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_host_unweighted_nobag_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_host_unweighted_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_weighted_codegen_cpu.cpp
[Forward Quantized]: gen_embedding_forward_quantized_unweighted_codegen_cpu.cpp
[Forward Split]: gen_embedding_forward_dense_weighted_codegen_cuda.cu
[Forward Split]: gen_embedding_forward_dense_unweighted_codegen_cuda.cu
[Forward Split]: gen_embedding_forward_split_weighted_vbe_codegen_cuda.cu
[Forward Split]: gen_embedding_forward_split_weighted_codegen_cuda.cu
[Forward Split]: gen_embedding_forward_split_unweighted_vbe_codegen_cuda.cu
[Forward Split]: gen_embedding_forward_split_unweighted_codegen_cuda.cu
[Forward Split]: gen_embedding_forward_dense_weighted_kernel.cu
[Forward Split]: gen_embedding_forward_dense_unweighted_nobag_kernel.cu
[Forward Split]: gen_embedding_forward_dense_unweighted_kernel.cu
[Forward Split]: gen_embedding_forward_split_weighted_vbe_kernel.cu
[Forward Split]: gen_embedding_forward_split_weighted_kernel.cu
[Forward Split]: gen_embedding_forward_split_unweighted_nobag_kernel.cu
[Forward Split]: gen_embedding_forward_split_unweighted_vbe_kernel.cu
[Forward Split]: gen_embedding_forward_split_unweighted_kernel.cu
[Forward Split]: gen_embedding_forward_split_weighted_v2_kernel.cu
[Forward Split]: gen_embedding_forward_split_unweighted_v2_kernel.cu
[Forward Split]: gen_embedding_forward_dense_unweighted_nobag_kernel_small.cu
[Forward Split]: gen_embedding_forward_split_unweighted_nobag_kernel_small.cu
[Backward Split] [adagrad]: gen_embedding_backward_adagrad_split_weighted_cuda.cu
[Backward Split] [adagrad]: gen_embedding_backward_adagrad_split_unweighted_nobag_cuda.cu
[Backward Split] [adagrad]: gen_embedding_backward_adagrad_split_unweighted_cuda.cu
[Backward Split] [adagrad]: gen_embedding_backward_adagrad_split_weighted_kernel_cta.cu
[Backward Split] [adagrad]: gen_embedding_backward_adagrad_split_unweighted_nobag_kernel_cta.cu
[Backward Split] [adagrad]: gen_embedding_backward_adagrad_split_unweighted_kernel_cta.cu
[Backward Split] [adagrad]: gen_embedding_backward_adagrad_split_weighted_kernel_warp.cu
[Backward Split] [adagrad]: gen_embedding_backward_adagrad_split_unweighted_nobag_kernel_warp.cu
[Backward Split] [adagrad]: gen_embedding_backward_adagrad_split_unweighted_kernel_warp.cu
[Backward Split] [adagrad]: gen_embedding_backward_split_adagrad.cpp
[Backward Split] [adagrad]: lookup_adagrad.py
[Backward Split] [adagrad]: gen_embedding_backward_adagrad_split_cpu.cpp
[Backward Split] [adagrad]: gen_embedding_backward_split_adagrad_cpu.cpp
[Backward Split] [adam]: gen_embedding_backward_adam_split_weighted_cuda.cu
[Backward Split] [adam]: gen_embedding_backward_adam_split_unweighted_nobag_cuda.cu
[Backward Split] [adam]: gen_embedding_backward_adam_split_unweighted_cuda.cu
[Backward Split] [adam]: gen_embedding_backward_adam_split_weighted_kernel_cta.cu
[Backward Split] [adam]: gen_embedding_backward_adam_split_unweighted_nobag_kernel_cta.cu
[Backward Split] [adam]: gen_embedding_backward_adam_split_unweighted_kernel_cta.cu
[Backward Split] [adam]: gen_embedding_backward_adam_split_weighted_kernel_warp.cu
[Backward Split] [adam]: gen_embedding_backward_adam_split_unweighted_nobag_kernel_warp.cu
[Backward Split] [adam]: gen_embedding_backward_adam_split_unweighted_kernel_warp.cu
[Backward Split] [adam]: gen_embedding_backward_split_adam.cpp
[Backward Split] [adam]: lookup_adam.py
[Backward Split] [adam]: gen_embedding_backward_split_adam_cpu.cpp
[Backward Split] [lamb]: gen_embedding_backward_lamb_split_weighted_cuda.cu
[Backward Split] [lamb]: gen_embedding_backward_lamb_split_unweighted_nobag_cuda.cu
[Backward Split] [lamb]: gen_embedding_backward_lamb_split_unweighted_cuda.cu
[Backward Split] [lamb]: gen_embedding_backward_lamb_split_weighted_kernel_cta.cu
[Backward Split] [lamb]: gen_embedding_backward_lamb_split_unweighted_nobag_kernel_cta.cu
[Backward Split] [lamb]: gen_embedding_backward_lamb_split_unweighted_kernel_cta.cu
[Backward Split] [lamb]: gen_embedding_backward_lamb_split_weighted_kernel_warp.cu
[Backward Split] [lamb]: gen_embedding_backward_lamb_split_unweighted_nobag_kernel_warp.cu
[Backward Split] [lamb]: gen_embedding_backward_lamb_split_unweighted_kernel_warp.cu
[Backward Split] [lamb]: gen_embedding_backward_split_lamb.cpp
[Backward Split] [lamb]: lookup_lamb.py
[Backward Split] [lamb]: gen_embedding_backward_split_lamb_cpu.cpp
[Backward Split] [lars_sgd]: gen_embedding_backward_lars_sgd_split_weighted_cuda.cu
[Backward Split] [lars_sgd]: gen_embedding_backward_lars_sgd_split_unweighted_nobag_cuda.cu
[Backward Split] [lars_sgd]: gen_embedding_backward_lars_sgd_split_unweighted_cuda.cu
[Backward Split] [lars_sgd]: gen_embedding_backward_lars_sgd_split_weighted_kernel_cta.cu
[Backward Split] [lars_sgd]: gen_embedding_backward_lars_sgd_split_unweighted_nobag_kernel_cta.cu
[Backward Split] [lars_sgd]: gen_embedding_backward_lars_sgd_split_unweighted_kernel_cta.cu
[Backward Split] [lars_sgd]: gen_embedding_backward_lars_sgd_split_weighted_kernel_warp.cu
[Backward Split] [lars_sgd]: gen_embedding_backward_lars_sgd_split_unweighted_nobag_kernel_warp.cu
[Backward Split] [lars_sgd]: gen_embedding_backward_lars_sgd_split_unweighted_kernel_warp.cu
[Backward Split] [lars_sgd]: gen_embedding_backward_split_lars_sgd.cpp
[Backward Split] [lars_sgd]: lookup_lars_sgd.py
[Backward Split] [lars_sgd]: gen_embedding_backward_split_lars_sgd_cpu.cpp
[Backward Split] [partial_rowwise_adam]: gen_embedding_backward_partial_rowwise_adam_split_weighted_cuda.cu
[Backward Split] [partial_rowwise_adam]: gen_embedding_backward_partial_rowwise_adam_split_unweighted_nobag_cuda.cu
[Backward Split] [partial_rowwise_adam]: gen_embedding_backward_partial_rowwise_adam_split_unweighted_cuda.cu
[Backward Split] [partial_rowwise_adam]: gen_embedding_backward_partial_rowwise_adam_split_weighted_kernel_cta.cu
[Backward Split] [partial_rowwise_adam]: gen_embedding_backward_partial_rowwise_adam_split_unweighted_nobag_kernel_cta.cu
[Backward Split] [partial_rowwise_adam]: gen_embedding_backward_partial_rowwise_adam_split_unweighted_kernel_cta.cu
[Backward Split] [partial_rowwise_adam]: gen_embedding_backward_partial_rowwise_adam_split_weighted_kernel_warp.cu
[Backward Split] [partial_rowwise_adam]: gen_embedding_backward_partial_rowwise_adam_split_unweighted_nobag_kernel_warp.cu
[Backward Split] [partial_rowwise_adam]: gen_embedding_backward_partial_rowwise_adam_split_unweighted_kernel_warp.cu
[Backward Split] [partial_rowwise_adam]: gen_embedding_backward_split_partial_rowwise_adam.cpp
[Backward Split] [partial_rowwise_adam]: lookup_partial_rowwise_adam.py
[Backward Split] [partial_rowwise_adam]: gen_embedding_backward_split_partial_rowwise_adam_cpu.cpp
[Backward Split] [partial_rowwise_lamb]: gen_embedding_backward_partial_rowwise_lamb_split_weighted_cuda.cu
[Backward Split] [partial_rowwise_lamb]: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_nobag_cuda.cu
[Backward Split] [partial_rowwise_lamb]: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_cuda.cu
[Backward Split] [partial_rowwise_lamb]: gen_embedding_backward_partial_rowwise_lamb_split_weighted_kernel_cta.cu
[Backward Split] [partial_rowwise_lamb]: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_nobag_kernel_cta.cu
[Backward Split] [partial_rowwise_lamb]: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_kernel_cta.cu
[Backward Split] [partial_rowwise_lamb]: gen_embedding_backward_partial_rowwise_lamb_split_weighted_kernel_warp.cu
[Backward Split] [partial_rowwise_lamb]: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_nobag_kernel_warp.cu
[Backward Split] [partial_rowwise_lamb]: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_kernel_warp.cu
[Backward Split] [partial_rowwise_lamb]: gen_embedding_backward_split_partial_rowwise_lamb.cpp
[Backward Split] [partial_rowwise_lamb]: lookup_partial_rowwise_lamb.py
[Backward Split] [partial_rowwise_lamb]: gen_embedding_backward_split_partial_rowwise_lamb_cpu.cpp
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_cuda.cu
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_weighted_cuda.cu
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_unweighted_nobag_cuda.cu
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_cuda.cu
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_unweighted_cuda.cu
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_kernel_cta.cu
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_weighted_kernel_cta.cu
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_unweighted_nobag_kernel_cta.cu
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_kernel_cta.cu
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_unweighted_kernel_cta.cu
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_kernel_warp.cu
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_weighted_kernel_warp.cu
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_unweighted_nobag_kernel_warp.cu
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_kernel_warp.cu
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_unweighted_kernel_warp.cu
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_split_rowwise_adagrad.cpp
[Backward Split] [rowwise_adagrad]: lookup_rowwise_adagrad.py
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_cpu.cpp
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_split_rowwise_adagrad_cpu.cpp
[Backward Split] [approx_rowwise_adagrad]: gen_embedding_backward_split_approx_rowwise_adagrad.cpp
[Backward Split] [approx_rowwise_adagrad]: gen_embedding_backward_split_approx_rowwise_adagrad_cpu.cpp
[Backward Split] [rowwise_adagrad_with_weight_decay]: gen_embedding_backward_rowwise_adagrad_with_weight_decay_split_weighted_cuda.cu
[Backward Split] [rowwise_adagrad_with_weight_decay]: gen_embedding_backward_rowwise_adagrad_with_weight_decay_split_unweighted_nobag_cuda.cu
[Backward Split] [rowwise_adagrad_with_weight_decay]: gen_embedding_backward_rowwise_adagrad_with_weight_decay_split_unweighted_cuda.cu
[Backward Split] [rowwise_adagrad_with_weight_decay]: gen_embedding_backward_rowwise_adagrad_with_weight_decay_split_weighted_kernel_cta.cu
[Backward Split] [rowwise_adagrad_with_weight_decay]: gen_embedding_backward_rowwise_adagrad_with_weight_decay_split_unweighted_nobag_kernel_cta.cu
[Backward Split] [rowwise_adagrad_with_weight_decay]: gen_embedding_backward_rowwise_adagrad_with_weight_decay_split_unweighted_kernel_cta.cu
[Backward Split] [rowwise_adagrad_with_weight_decay]: gen_embedding_backward_rowwise_adagrad_with_weight_decay_split_weighted_kernel_warp.cu
[Backward Split] [rowwise_adagrad_with_weight_decay]: gen_embedding_backward_rowwise_adagrad_with_weight_decay_split_unweighted_nobag_kernel_warp.cu
[Backward Split] [rowwise_adagrad_with_weight_decay]: gen_embedding_backward_rowwise_adagrad_with_weight_decay_split_unweighted_kernel_warp.cu
[Backward Split] [rowwise_adagrad_with_weight_decay]: gen_embedding_backward_split_rowwise_adagrad_with_weight_decay.cpp
[Backward Split] [rowwise_adagrad_with_weight_decay]: lookup_rowwise_adagrad_with_weight_decay.py
[Backward Split] [rowwise_adagrad_with_weight_decay]: gen_embedding_backward_split_rowwise_adagrad_with_weight_decay_cpu.cpp
[Backward Split] [approx_rowwise_adagrad_with_weight_decay]: gen_embedding_backward_approx_rowwise_adagrad_with_weight_decay_split_weighted_cuda.cu
[Backward Split] [approx_rowwise_adagrad_with_weight_decay]: gen_embedding_backward_approx_rowwise_adagrad_with_weight_decay_split_unweighted_nobag_cuda.cu
[Backward Split] [approx_rowwise_adagrad_with_weight_decay]: gen_embedding_backward_approx_rowwise_adagrad_with_weight_decay_split_unweighted_cuda.cu
[Backward Split] [approx_rowwise_adagrad_with_weight_decay]: gen_embedding_backward_approx_rowwise_adagrad_with_weight_decay_split_weighted_kernel_cta.cu
[Backward Split] [approx_rowwise_adagrad_with_weight_decay]: gen_embedding_backward_approx_rowwise_adagrad_with_weight_decay_split_unweighted_nobag_kernel_cta.cu
[Backward Split] [approx_rowwise_adagrad_with_weight_decay]: gen_embedding_backward_approx_rowwise_adagrad_with_weight_decay_split_unweighted_kernel_cta.cu
[Backward Split] [approx_rowwise_adagrad_with_weight_decay]: gen_embedding_backward_approx_rowwise_adagrad_with_weight_decay_split_weighted_kernel_warp.cu
[Backward Split] [approx_rowwise_adagrad_with_weight_decay]: gen_embedding_backward_approx_rowwise_adagrad_with_weight_decay_split_unweighted_nobag_kernel_warp.cu
[Backward Split] [approx_rowwise_adagrad_with_weight_decay]: gen_embedding_backward_approx_rowwise_adagrad_with_weight_decay_split_unweighted_kernel_warp.cu
[Backward Split] [approx_rowwise_adagrad_with_weight_decay]: gen_embedding_backward_split_approx_rowwise_adagrad_with_weight_decay.cpp
[Backward Split] [approx_rowwise_adagrad_with_weight_decay]: lookup_approx_rowwise_adagrad_with_weight_decay.py
[Backward Split] [approx_rowwise_adagrad_with_weight_decay]: gen_embedding_backward_split_approx_rowwise_adagrad_with_weight_decay_cpu.cpp
[Backward Split] [rowwise_adagrad_with_counter]: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_cuda.cu
[Backward Split] [rowwise_adagrad_with_counter]: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_nobag_cuda.cu
[Backward Split] [rowwise_adagrad_with_counter]: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_cuda.cu
[Backward Split] [rowwise_adagrad_with_counter]: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_kernel_cta.cu
[Backward Split] [rowwise_adagrad_with_counter]: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_nobag_kernel_cta.cu
[Backward Split] [rowwise_adagrad_with_counter]: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_kernel_cta.cu
[Backward Split] [rowwise_adagrad_with_counter]: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_kernel_warp.cu
[Backward Split] [rowwise_adagrad_with_counter]: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_nobag_kernel_warp.cu
[Backward Split] [rowwise_adagrad_with_counter]: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_kernel_warp.cu
[Backward Split] [rowwise_adagrad_with_counter]: gen_embedding_backward_split_rowwise_adagrad_with_counter.cpp
[Backward Split] [rowwise_adagrad_with_counter]: lookup_rowwise_adagrad_with_counter.py
[Backward Split] [rowwise_adagrad_with_counter]: gen_embedding_backward_rowwise_adagrad_with_counter_split_cpu.cpp
[Backward Split] [rowwise_adagrad_with_counter]: gen_embedding_backward_split_rowwise_adagrad_with_counter_cpu.cpp
[Backward Split] [approx_rowwise_adagrad_with_counter]: gen_embedding_backward_split_approx_rowwise_adagrad_with_counter.cpp
[Backward Split] [approx_rowwise_adagrad_with_counter]: gen_embedding_backward_split_approx_rowwise_adagrad_with_counter_cpu.cpp
[Backward Split] [rowwise_weighted_adagrad]: gen_embedding_backward_rowwise_weighted_adagrad_split_weighted_cuda.cu
[Backward Split] [rowwise_weighted_adagrad]: gen_embedding_backward_rowwise_weighted_adagrad_split_unweighted_nobag_cuda.cu
[Backward Split] [rowwise_weighted_adagrad]: gen_embedding_backward_rowwise_weighted_adagrad_split_unweighted_cuda.cu
[Backward Split] [rowwise_weighted_adagrad]: gen_embedding_backward_rowwise_weighted_adagrad_split_weighted_kernel_cta.cu
[Backward Split] [rowwise_weighted_adagrad]: gen_embedding_backward_rowwise_weighted_adagrad_split_unweighted_nobag_kernel_cta.cu
[Backward Split] [rowwise_weighted_adagrad]: gen_embedding_backward_rowwise_weighted_adagrad_split_unweighted_kernel_cta.cu
[Backward Split] [rowwise_weighted_adagrad]: gen_embedding_backward_rowwise_weighted_adagrad_split_weighted_kernel_warp.cu
[Backward Split] [rowwise_weighted_adagrad]: gen_embedding_backward_rowwise_weighted_adagrad_split_unweighted_nobag_kernel_warp.cu
[Backward Split] [rowwise_weighted_adagrad]: gen_embedding_backward_rowwise_weighted_adagrad_split_unweighted_kernel_warp.cu
[Backward Split] [rowwise_weighted_adagrad]: gen_embedding_backward_split_rowwise_weighted_adagrad.cpp
[Backward Split] [rowwise_weighted_adagrad]: lookup_rowwise_weighted_adagrad.py
[Backward Split] [rowwise_weighted_adagrad]: gen_embedding_backward_rowwise_weighted_adagrad_split_cpu.cpp
[Backward Split] [rowwise_weighted_adagrad]: gen_embedding_backward_split_rowwise_weighted_adagrad_cpu.cpp
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_weighted_vbe_cuda.cu
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_weighted_cuda.cu
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_unweighted_nobag_cuda.cu
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_unweighted_vbe_cuda.cu
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_unweighted_cuda.cu
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_weighted_vbe_kernel_cta.cu
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_weighted_kernel_cta.cu
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_unweighted_nobag_kernel_cta.cu
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_unweighted_vbe_kernel_cta.cu
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_unweighted_kernel_cta.cu
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_weighted_vbe_kernel_warp.cu
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_weighted_kernel_warp.cu
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_unweighted_nobag_kernel_warp.cu
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_unweighted_vbe_kernel_warp.cu
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_unweighted_kernel_warp.cu
[Backward Split] [sgd]: gen_embedding_backward_split_sgd.cpp
[Backward Split] [sgd]: lookup_sgd.py
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_cpu.cpp
[Backward Split] [sgd]: gen_embedding_backward_split_sgd_cpu.cpp
[Backward Split] [approx_sgd]: gen_embedding_backward_split_approx_sgd.cpp
[Backward Split] [approx_sgd]: gen_embedding_backward_split_approx_sgd_cpu.cpp
[Backward Split] [none]: gen_embedding_backward_none_split_weighted_cuda.cu
[Backward Split] [none]: gen_embedding_backward_none_split_unweighted_nobag_cuda.cu
[Backward Split] [none]: gen_embedding_backward_none_split_unweighted_cuda.cu
[Backward Split] [none]: gen_embedding_backward_none_split_weighted_kernel_cta.cu
[Backward Split] [none]: gen_embedding_backward_none_split_unweighted_nobag_kernel_cta.cu
[Backward Split] [none]: gen_embedding_backward_none_split_unweighted_kernel_cta.cu
[Backward Split] [none]: gen_embedding_backward_none_split_weighted_kernel_warp.cu
[Backward Split] [none]: gen_embedding_backward_none_split_unweighted_nobag_kernel_warp.cu
[Backward Split] [none]: gen_embedding_backward_none_split_unweighted_kernel_warp.cu
[Backward Split] [none]: gen_embedding_backward_split_none.cpp
[Backward Split] [none]: lookup_none.py
[Backward Split] [none]: gen_embedding_backward_split_none_cpu.cpp
[110/111] Linking CXX shared module fbgemm_gpu_py.so
FAILED: fbgemm_gpu_py.so
: && /usr/local/Cellar/gcc/13.2.0/bin/g++-13 -D_GLIBCXX_USE_CXX11_ABI=0 -DNO_AVX512=1 -O3 -DNDEBUG -arch x86_64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk -mmacosx-version-min=13.6 -bundle -Wl,-headerpad_max_install_names -s -o fbgemm_gpu_py.so CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/arm/a64assembler.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/arm/a64builder.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/arm/a64compiler.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/arm/a64emithelper.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/arm/a64formatter.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/arm/a64func.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/arm/a64instapi.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/arm/a64instdb.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/arm/a64operand.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/arm/a64rapass.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/arm/armformatter.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/archtraits.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/assembler.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/builder.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/codeholder.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/codewriter.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/compiler.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/constpool.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/cpuinfo.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/emithelper.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/emitter.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/emitterutils.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/environment.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/errorhandler.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/formatter.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/func.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/funcargscontext.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/globals.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/inst.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/jitallocator.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/jitruntime.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/logger.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/operand.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/osutils.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/ralocal.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/rapass.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/rastack.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/string.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/support.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/target.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/type.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/virtmem.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/zone.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/zonehash.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/zonelist.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/zonestack.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/zonetree.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/zonevector.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/x86/x86assembler.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/x86/x86builder.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/x86/x86compiler.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/x86/x86emithelper.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/x86/x86formatter.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/x86/x86func.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/x86/x86instapi.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/x86/x86instdb.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/x86/x86operand.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/x86/x86rapass.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/src/EmbeddingSpMDM.cc.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/src/EmbeddingSpMDMNBit.cc.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/src/QuantUtils.cc.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/src/RefImplementations.cc.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/src/RowWiseSparseAdagradFused.cc.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/src/SparseAdagrad.cc.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/src/Utils.cc.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/src/EmbeddingSpMDMAvx2.cc.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/src/QuantUtilsAvx2.cc.o CMakeFiles/fbgemm_gpu_py.dir/codegen/embedding_forward_split_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/codegen/embedding_forward_quantized_host_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/codegen/embedding_backward_dense_host_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/codegen/embedding_bounds_check_host_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/src/permute_pooled_embedding_ops/permute_pooled_embedding_ops_split_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/src/jagged_tensor_ops/jagged_tensor_ops_autograd.cpp.o CMakeFiles/fbgemm_gpu_py.dir/src/jagged_tensor_ops/jagged_tensor_ops_meta.cpp.o CMakeFiles/fbgemm_gpu_py.dir/src/jagged_tensor_ops/jagged_tensor_ops_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/src/input_combine_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/src/layout_transform_ops_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/src/quantize_ops/quantize_ops_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/src/quantize_ops/quantize_ops_meta.cpp.o CMakeFiles/fbgemm_gpu_py.dir/src/sparse_ops/sparse_ops_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/src/sparse_ops/sparse_ops_meta.cpp.o CMakeFiles/fbgemm_gpu_py.dir/src/embedding_inplace_update_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/codegen/batch_index_select_dim0_cpu_host.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_forward_quantized_unweighted_codegen_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_forward_quantized_weighted_codegen_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_dense_split_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_adagrad_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_rowwise_adagrad_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_rowwise_adagrad_with_counter_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_rowwise_weighted_adagrad_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_sgd_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_adam_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_lamb_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_partial_rowwise_adam_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_partial_rowwise_lamb_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_lars_sgd_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_rowwise_adagrad_with_weight_decay_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_approx_rowwise_adagrad_with_weight_decay_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_none_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_approx_sgd_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_approx_rowwise_adagrad_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_approx_rowwise_adagrad_with_counter_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_adagrad_split_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_rowwise_adagrad_split_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_rowwise_adagrad_with_counter_split_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_rowwise_weighted_adagrad_split_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_sgd_split_cpu.cpp.o -L/lib/intel64   -L/lib/intel64_win   -L/lib/win-x64 -Wl,-rpath,/lib/intel64 -Wl,-rpath,/lib/intel64_win -Wl,-rpath,/lib/win-x64 -Wl,-rpath,/Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/lib  /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/lib/libc10.dylib  /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/lib/libtorch.dylib  /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/lib/libtorch_cpu.dylib  /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/lib/libc10.dylib && :
ld: warning: -s is obsolete
ld: warning: search path '/lib/intel64' not found
ld: warning: search path '/lib/intel64_win' not found
ld: warning: search path '/lib/win-x64' not found
ld: Undefined symbols:
  _GOMP_barrier, referenced from:
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb1EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.1 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb1EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.2 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb1EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.2 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb1EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.2 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb0EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.1 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb0EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.2 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb0EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.2 in embedding_forward_split_cpu.cpp.o
      ...
  _GOMP_parallel, referenced from:
      __ZN8internal7csr2cscIfEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS3_16DefaultPtrTraitsExEES8_RKNS4_IT_Lm1ES5_xEExPKix in embedding_forward_split_cpu.cpp.o
      __ZN8internal7csr2cscIfEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS3_16DefaultPtrTraitsExEES8_RKNS4_IT_Lm1ES5_xEExPKix in embedding_forward_split_cpu.cpp.o
      __ZN8internal7csr2cscIfEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS3_16DefaultPtrTraitsExEES8_RKNS4_IT_Lm1ES5_xEExPKix in embedding_forward_split_cpu.cpp.o
      __ZN8internal7csr2cscIfEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS3_16DefaultPtrTraitsExEES8_RKNS4_IT_Lm1ES5_xEExPKix in embedding_forward_split_cpu.cpp.o
      __ZN8internal7csr2cscIfEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS3_16DefaultPtrTraitsExEES8_RKNS4_IT_Lm1ES5_xEExPKix in embedding_forward_split_cpu.cpp.o
      __ZN8internal7csr2cscIfEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS3_16DefaultPtrTraitsExEES8_RKNS4_IT_Lm1ES5_xEExPKix in embedding_forward_split_cpu.cpp.o
      __ZN8internal7csr2cscIfEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS3_16DefaultPtrTraitsExEES8_RKNS4_IT_Lm1ES5_xEExPKix in embedding_forward_split_cpu.cpp.o
      __ZN8internal7csr2cscIfEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS3_16DefaultPtrTraitsExEES8_RKNS4_IT_Lm1ES5_xEExPKix in embedding_forward_split_cpu.cpp.o
      ...
  __ZN3c1010Dispatcher17runRecordFunctionERN2at14RecordFunctionESt17reference_wrapperIKNS_14FunctionSchemaEENS_11DispatchKeyE, referenced from:
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIN2at6TensorEJRKS3_RKSt6vectorIS3_SaIS3_EERKNS_6SymIntEEEET_RKNS_19TypedOperatorHandleIFSE_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionESH_ in jagged_tensor_ops_autograd.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIN2at6TensorEJRKS3_RKSt6vectorIS3_SaIS3_EERKNS_8ArrayRefINS_6SymIntEEEdEEET_RKNS_19TypedOperatorHandleIFSG_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionESJ_ in jagged_tensor_ops_autograd.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIN2at6TensorEJRKS3_RKSt6vectorIS3_SaIS3_EES5_S5_EEET_RKNS_19TypedOperatorHandleIFSB_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionESE_ in jagged_tensor_ops_autograd.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathISt5tupleIJN2at6TensorES4_EEJRKS4_RKSt6vectorIS4_SaIS4_EES7_S7_EEET_RKNS_19TypedOperatorHandleIFSD_DpT0_EEERNS3_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionESG_ in jagged_tensor_ops_autograd.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIN2at6TensorEJRKS3_RKSt6vectorIS3_SaIS3_EES5_EEET_RKNS_19TypedOperatorHandleIFSB_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionESE_ in jagged_tensor_ops_autograd.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathISt5tupleIJN2at6TensorES4_EEJRKS4_S7_S7_S7_EEET_RKNS_19TypedOperatorHandleIFS8_DpT0_EEERNS3_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionESB_ in jagged_tensor_ops_autograd.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIN2at6TensorEJRKS3_S5_S5_EEET_RKNS_19TypedOperatorHandleIFS6_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionES9_ in jagged_tensor_ops_autograd.cpp.o
      ...
  __ZN3c1010Dispatcher17runRecordFunctionERN2at14RecordFunctionESt17reference_wrapperIKNS_14FunctionSchemaEENS_11DispatchKeyENS_8ArrayRefIKNS_6IValueEEE, referenced from:
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIN2at6TensorEJRKS3_RKSt6vectorIS3_SaIS3_EERKNS_6SymIntEEEET_RKNS_19TypedOperatorHandleIFSE_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionESH_ in jagged_tensor_ops_autograd.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIN2at6TensorEJRKS3_RKSt6vectorIS3_SaIS3_EERKNS_8ArrayRefINS_6SymIntEEEdEEET_RKNS_19TypedOperatorHandleIFSG_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionESJ_ in jagged_tensor_ops_autograd.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIN2at6TensorEJRKS3_RKSt6vectorIS3_SaIS3_EES5_S5_EEET_RKNS_19TypedOperatorHandleIFSB_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionESE_ in jagged_tensor_ops_autograd.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathISt5tupleIJN2at6TensorES4_EEJRKS4_RKSt6vectorIS4_SaIS4_EES7_S7_EEET_RKNS_19TypedOperatorHandleIFSD_DpT0_EEERNS3_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionESG_ in jagged_tensor_ops_autograd.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIN2at6TensorEJRKS3_RKSt6vectorIS3_SaIS3_EES5_EEET_RKNS_19TypedOperatorHandleIFSB_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionESE_ in jagged_tensor_ops_autograd.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathISt5tupleIJN2at6TensorES4_EEJRKS4_S7_S7_S7_EEET_RKNS_19TypedOperatorHandleIFS8_DpT0_EEERNS3_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionESB_ in jagged_tensor_ops_autograd.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIN2at6TensorEJRKS3_S5_S5_EEET_RKNS_19TypedOperatorHandleIFS6_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionES9_ in jagged_tensor_ops_autograd.cpp.o
      ...
  __ZN3c1010TensorImpl17set_autograd_metaESt10unique_ptrINS_21AutogradMetaInterfaceESt14default_deleteIS2_EE, referenced from:
      __ZN5torch8autograd13make_variableEN2at6TensorEbb in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch8autograd13make_variableEN2at6TensorEbb in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch8autograd13make_variableEN2at6TensorEbb in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch8autograd13make_variableEN2at6TensorEbb in embedding_forward_quantized_host_cpu.cpp.o
  __ZN3c1022getCustomClassTypeImplERKSt10type_index, referenced from:
      __ZN3c1018getFakeTypePtrCopyINS_13intrusive_ptrI11TensorQueueNS_6detail34intrusive_target_default_null_typeIS2_EEEEEENS_4Type24SingletonOrSharedTypePtrIS7_EEv in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c1014getTypePtrCopyINS_14tagged_capsuleI11TensorQueueEEEENS_4Type24SingletonOrSharedTypePtrIS4_EEv in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c1018getFakeTypePtrCopyINS_13intrusive_ptrI13AtomicCounterNS_6detail34intrusive_target_default_null_typeIS2_EEEEEENS_4Type24SingletonOrSharedTypePtrIS7_EEv in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c1018getFakeTypePtrCopyINS_14tagged_capsuleI13AtomicCounterEEEENS_4Type24SingletonOrSharedTypePtrIS4_EEv in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c1018getFakeTypePtrCopyINS_13intrusive_ptrI12PrunedMapCPUNS_6detail34intrusive_target_default_null_typeIS2_EEEEEENS_4Type24SingletonOrSharedTypePtrIS7_EEv in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c1014getTypePtrCopyINS_14tagged_capsuleI12PrunedMapCPUEEEENS_4Type24SingletonOrSharedTypePtrIS4_EEv in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c1014getTypePtrCopyINS_13intrusive_ptrI13AtomicCounterNS_6detail34intrusive_target_default_null_typeIS2_EEEEEENS_4Type24SingletonOrSharedTypePtrIS7_EEv in embedding_forward_quantized_host_cpu.cpp.o
      ...
  __ZN3c105ErrorC2ENS_14SourceLocationESs, referenced from:
      __ZN3c106ivalue6Future20getDevicesOfStoragesERKNS_4impl16VirtualGuardImplERKSt6vectorINS_18weak_intrusive_ptrINS_11StorageImplENS_6detail34intrusive_target_default_null_typeIS8_EEEESaISC_EE in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c1013intrusive_ptrINS_6ivalue6FutureENS_6detail34intrusive_target_default_null_typeIS2_EEE4makeIJNS_4Type24SingletonOrSharedTypePtrIS8_EEEEES6_DpOT_ in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c106ivalue6Future14invokeCallbackISt8functionIFvRS1_EEEEvT_ in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c106ivalue6Future13markCompletedENS_6IValueENS_8optionalISt6vectorINS_18weak_intrusive_ptrINS_11StorageImplENS_6detail34intrusive_target_default_null_typeIS6_EEEESaISA_EEEE in embedding_forward_quantized_host_cpu.cpp.o
      __ZN10fbgemm_gpu28dense_to_jagged_forward_metaERKN2at6TensorERKSt6vectorIS1_SaIS1_EERKN3c108optionalINS9_6SymIntEEE.cold in jagged_tensor_ops_meta.cpp.o
  __ZN3c106detail12infer_schema20make_function_schemaEOSsS2_NS_8ArrayRefINS1_11ArgumentDefEEES5_, referenced from:
      __ZN5torch6class_I13AtomicCounterE12defineMethodINS_6detail10WrapMethodIMS1_FxvEEEEEPNS_3jit8FunctionESsT_SsSt16initializer_listINS_3argEE.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I11TensorQueueE12defineMethodINS_6detail10WrapMethodIMS1_FN2at6TensorEvEEEEEPNS_3jit8FunctionESsT_SsSt16initializer_listINS_3argEE.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I11TensorQueueE12defineMethodIZNS2_3defIJN2at6TensorEEEERS2_NS_6detail5typesIvJDpT_EEESsSt16initializer_listINS_3argEEEUlN3c1014tagged_capsuleIS1_EES6_E_EEPNS_3jit8FunctionESsT_SsSF_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I13AtomicCounterE12defineMethodIZNS2_3defIJEEERS2_NS_6detail5typesIvJDpT_EEESsSt16initializer_listINS_3argEEEUlN3c1014tagged_capsuleIS1_EEE_EEPNS_3jit8FunctionESsT_SsSD_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I12PrunedMapCPUE12defineMethodIZNS2_3defIJEEERS2_NS_6detail5typesIvJDpT_EEESsSt16initializer_listINS_3argEEEUlN3c1014tagged_capsuleIS1_EEE_EEPNS_3jit8FunctionESsT_SsSD_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I11TensorQueueE10def_pickleINL19TensorQueueRegistryMUlRKN3c1013intrusive_ptrIS1_NS5_6detail34intrusive_target_default_null_typeIS1_EEEEE_ENS4_UlNS5_4DictISsN2at6TensorEEEE_EEERS2_OT_OT0_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I11TensorQueueE10def_pickleINL19TensorQueueRegistryMUlRKN3c1013intrusive_ptrIS1_NS5_6detail34intrusive_target_default_null_typeIS1_EEEEE_ENS4_UlNS5_4DictISsN2at6TensorEEEE_EEERS2_OT_OT0_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      ...
  __ZN3c106detail14torchCheckFailEPKcS2_jRKSs, referenced from:
      __ZN2at5emptyEN3c108ArrayRefIxEENS0_13TensorOptionsENS0_8optionalINS0_12MemoryFormatEEE in embedding_forward_split_cpu.cpp.o
      __ZN10fbgemm_gpu22report_embedding_errorIxEEviiiiPKT_S3_xb.constprop.0 in embedding_forward_split_cpu.cpp.o
      __ZNKR2at10TensorBase8accessorIiLm1EEENS_14TensorAccessorIT_XT0_ENS_16DefaultPtrTraitsExEEv in embedding_forward_split_cpu.cpp.o
      __ZNKR2at10TensorBase8accessorIxLm1EEENS_14TensorAccessorIT_XT0_ENS_16DefaultPtrTraitsExEEv in embedding_forward_split_cpu.cpp.o
      __ZZZZ35split_embedding_codegen_forward_cpuN2at6TensorES0_S0_xS0_S0_S0_xS0_xENKUlvE_clEvENKUlvE0_clEvENKUlvE_clEv in embedding_forward_split_cpu.cpp.o
      __ZZZZ35split_embedding_codegen_forward_cpuN2at6TensorES0_S0_xS0_S0_S0_xS0_xENKUlvE_clEvENKUlvE_clEvENKUlvE_clEv in embedding_forward_split_cpu.cpp.o
      __Z35split_embedding_codegen_forward_cpuN2at6TensorES0_S0_xS0_S0_S0_xS0_x in embedding_forward_split_cpu.cpp.o
      __Z35split_embedding_codegen_forward_cpuN2at6TensorES0_S0_xS0_S0_S0_xS0_x in embedding_forward_split_cpu.cpp.o
      ...
  __ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKSs, referenced from:
      __ZNKR3c106IValue8toObjectEv in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c1013QualifiedNameC1ERKSs in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c1013QualifiedNameC1ERKSs in embedding_forward_quantized_host_cpu.cpp.o
      __ZNK5torch6detail19TensorDataContainer11fill_tensorERN2at6TensorE in embedding_forward_quantized_host_cpu.cpp.o
      __ZNK5torch6detail19TensorDataContainer11fill_tensorERN2at6TensorE in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6detail32call_torchbind_method_from_stackIZNS_6class_I13AtomicCounterE10def_pickleINL21AtomicCounterRegistryMUlRKN3c1013intrusive_ptrIS3_NS7_6detail34intrusive_target_default_null_typeIS3_EEEEE_ENS6_UlSsE_EEERS4_OT_OT0_EUlNS7_14tagged_capsuleIS3_EEOSsE_Lb0EJLm0ELm1EEEENS7_4guts23infer_function_traits_t11return_typeERSI_RSt6vectorINS7_6IValueESaISV_EESt16integer_sequenceImJXspT1_EEE.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZNSt17_Function_handlerIFvRSt6vectorIN3c106IValueESaIS2_EEEZN5torch6class_I12PrunedMapCPUE12defineMethodIZNSA_10def_pickleINL20PrunedMapCPURegistryMUlRKNS1_13intrusive_ptrIS9_NS1_6detail34intrusive_target_default_null_typeIS9_EEEEE_ENSD_UlSsE_EEERSA_OT_OT0_EUlNS1_14tagged_capsuleIS9_EEOSsE_EEPNS7_3jit8FunctionESsSO_SsSt16initializer_listINS7_3argEEEUlS5_E_E9_M_invokeERKSt9_Any_dataS5_ in embedding_forward_quantized_host_cpu.cpp.o
      ...
  __ZN3c106detail8ListImplC1ESt6vectorINS_6IValueESaIS3_EENS_4Type24SingletonOrSharedTypePtrIS6_EE, referenced from:
      __ZN3c104ListINS_6SymIntEEC1Ev in jagged_tensor_ops_autograd.cpp.o
      __ZN3c104ListIxEC1Ev in jagged_tensor_ops_autograd.cpp.o
      __ZN3c104ListIN2at6TensorEEC1Ev in jagged_tensor_ops_autograd.cpp.o
  __ZN3c106ivalue14ConstantString6createESs, referenced from:
      __ZNK3c104DictISsN2at6TensorEE2atERKSs.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZNK3c104DictISsN2at6TensorEE6insertISsRKS2_EESt4pairINS_4impl12DictIteratorISsS2_N11ska_ordered8detailv317sherwood_v3_tableIS7_INS_6IValueESD_ESD_NS_6detail11DictKeyHashENSB_16KeyOrValueHasherISD_SE_SG_EENSF_14DictKeyEqualToENSB_18KeyOrValueEqualityISD_SE_SJ_EESaISE_ESaINSB_17sherwood_v3_entryISE_EEEE18templated_iteratorISE_EEEEbEOT_OT0_ in embedding_forward_quantized_host_cpu.cpp.o
      __ZNK11TensorQueue9serializeEv in embedding_forward_quantized_host_cpu.cpp.o
      __ZNSt17_Function_handlerIFvRSt6vectorIN3c106IValueESaIS2_EEEZN5torch6class_I12PrunedMapCPUE12defineMethodINL20PrunedMapCPURegistryMUlRKNS1_13intrusive_ptrIS9_NS1_6detail34intrusive_target_default_null_typeIS9_EEEEE_EEEPNS7_3jit8FunctionESsT_SsSt16initializer_listINS7_3argEEEUlS5_E_E9_M_invokeERKSt9_Any_dataS5_ in embedding_forward_quantized_host_cpu.cpp.o
      __ZNSt17_Function_handlerIFvRSt6vectorIN3c106IValueESaIS2_EEEZN5torch6class_I13AtomicCounterE12defineMethodINL21AtomicCounterRegistryMUlRKNS1_13intrusive_ptrIS9_NS1_6detail34intrusive_target_default_null_typeIS9_EEEEE_EEEPNS7_3jit8FunctionESsT_SsSt16initializer_listINS7_3argEEEUlS5_E_E9_M_invokeERKSt9_Any_dataS5_ in embedding_forward_quantized_host_cpu.cpp.o
  __ZN3c108DictType3getESsNS_4Type24SingletonOrSharedTypePtrIS1_EES3_, referenced from:
      __ZN3c1014getTypePtrCopyINS_4DictISsN2at6TensorEEEEENS_4Type24SingletonOrSharedTypePtrIS5_EEv in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c1018getFakeTypePtrCopyINS_4DictISsN2at6TensorEEEEENS_4Type24SingletonOrSharedTypePtrIS5_EEv in embedding_forward_quantized_host_cpu.cpp.o
  __ZN3c108ListType3getESsNS_4Type24SingletonOrSharedTypePtrIS1_EE, referenced from:
      __ZN3c1014getTypePtrCopyISt6vectorIN2at6TensorESaIS3_EEEENS_4Type24SingletonOrSharedTypePtrIS6_EEv in jagged_tensor_ops_autograd.cpp.o
      __ZN3c1018getFakeTypePtrCopyISt6vectorIN2at6TensorESaIS3_EEEENS_4Type24SingletonOrSharedTypePtrIS6_EEv in jagged_tensor_ops_autograd.cpp.o
      __ZN3c1014getTypePtrCopyISt6vectorIxSaIxEEEENS_4Type24SingletonOrSharedTypePtrIS4_EEv in jagged_tensor_ops_cpu.cpp.o
      __ZN3c1018getFakeTypePtrCopyISt6vectorIxSaIxEEEENS_4Type24SingletonOrSharedTypePtrIS4_EEv in jagged_tensor_ops_cpu.cpp.o
      __ZN3c1014getTypePtrCopyINS_8optionalISt6vectorIxSaIxEEEEEENS_4Type24SingletonOrSharedTypePtrIS6_EEv in sparse_ops_cpu.cpp.o
      __ZN3c1018getFakeTypePtrCopyINS_8optionalISt6vectorIxSaIxEEEEEENS_4Type24SingletonOrSharedTypePtrIS6_EEv in sparse_ops_cpu.cpp.o
  __ZN3c10lsERSoNS_10DeviceTypeE, referenced from:
      __ZN3c106detail12_str_wrapperIJPKcRKNS_10DeviceTypeES3_EE4callERKS3_S6_S9_ in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c106ivalue6Future20getDevicesOfStoragesERKNS_4impl16VirtualGuardImplERKSt6vectorINS_18weak_intrusive_ptrINS_11StorageImplENS_6detail34intrusive_target_default_null_typeIS8_EEEESaISC_EE in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c1013intrusive_ptrINS_6ivalue6FutureENS_6detail34intrusive_target_default_null_typeIS2_EEE4makeIJNS_4Type24SingletonOrSharedTypePtrIS8_EEEEES6_DpOT_ in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c106ivalue6Future13markCompletedENS_6IValueENS_8optionalISt6vectorINS_18weak_intrusive_ptrINS_11StorageImplENS_6detail34intrusive_target_default_null_typeIS6_EEEESaISA_EEEE in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c106ivalue6Future13markCompletedENS_6IValueENS_8optionalISt6vectorINS_18weak_intrusive_ptrINS_11StorageImplENS_6detail34intrusive_target_default_null_typeIS6_EEEESaISA_EEEE in embedding_forward_quantized_host_cpu.cpp.o
  __ZN3c10lsERSoRKNS_12OperatorNameE, referenced from:
      __ZN3c106detail12_str_wrapperIJPKcRKNS_12OperatorNameES3_EE4callERKS3_S6_S9_ in jagged_tensor_ops_autograd.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIvJN2at6TensorES3_S3_S3_S3_xS3_xS3_S3_xS3_bS3_S3_S3_ddxEEET_RKNS_19TypedOperatorHandleIFS4_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionES7_ in gen_embedding_backward_split_adagrad_cpu.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIN2at6TensorEJS3_S3_S3_S3_S3_S3_S3_EEET_RKNS_19TypedOperatorHandleIFS4_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionES7_ in gen_embedding_backward_split_adagrad_cpu.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIN2at6TensorEJS3_S3_S3_xS3_S3_S3_xS3_xEEET_RKNS_19TypedOperatorHandleIFS4_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionES7_ in gen_embedding_backward_split_adagrad_cpu.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIvJN2at6TensorES3_S3_S3_S3_xS3_xS3_S3_xS3_bS3_S3_S3_dddxdxEEET_RKNS_19TypedOperatorHandleIFS4_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionES7_ in gen_embedding_backward_split_rowwise_adagrad_cpu.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIvJN2at6TensorES3_S3_S3_S3_xS3_xS3_S3_xS3_bS3_S3_S3_S3_S3_S3_S3_S3_S3_dddxxxdxxxddxxEEET_RKNS_19TypedOperatorHandleIFS4_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionES7_ in gen_embedding_backward_split_rowwise_adagrad_with_counter_cpu.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIvJN2at6TensorES3_S3_S3_S3_xS3_xS3_S3_xS3_bS3_S3_S3_dddxxEEET_RKNS_19TypedOperatorHandleIFS4_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionES7_ in gen_embedding_backward_split_rowwise_weighted_adagrad_cpu.cpp.o
      ...
  __ZN3c10lsERSoRKNS_6DeviceE, referenced from:
      __ZN3c106ivalue6Future18formatSetOfDevicesERKSt6vectorINS_6DeviceESaIS3_EE in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c106ivalue6Future18formatSetOfDevicesERKSt6vectorINS_6DeviceESaIS3_EE in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c106detail12_str_wrapperIJPKcRKNS_6DeviceES3_RKmS3_S6_EE4callERKS3_S6_SB_S8_SB_S6_ in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c106detail12_str_wrapperIJPKcRKNS_6DeviceES3_RKmS3_S6_EE4callERKS3_S6_SB_S8_SB_S6_ in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c106ivalue6Future20getDevicesOfStoragesERKNS_4impl16VirtualGuardImplERKSt6vectorINS_18weak_intrusive_ptrINS_11StorageImplENS_6detail34intrusive_target_default_null_typeIS8_EEEESaISC_EE in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c1013intrusive_ptrINS_6ivalue6FutureENS_6detail34intrusive_target_default_null_typeIS2_EEE4makeIJNS_4Type24SingletonOrSharedTypePtrIS8_EEEEES6_DpOT_ in embedding_forward_quantized_host_cpu.cpp.o
  __ZN3c10lsERSoRKNS_6IValueE, referenced from:
      __ZN3c10lsERSoRKNS_8ArgumentE.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c10lsERSoRKNS_8ArgumentE.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
  __ZN3c10lsERSoRKNS_6SymIntE, referenced from:
      __ZN3c10lsINS_6SymIntEEERSoS2_NS_8ArrayRefIT_EE in jagged_tensor_ops_meta.cpp.o
      __ZN3c10lsINS_6SymIntEEERSoS2_NS_8ArrayRefIT_EE in jagged_tensor_ops_meta.cpp.o
      __ZN10fbgemm_gpu44batched_dense_vec_jagged_2d_mul_forward_metaERKN2at6TensorES3_S3_ in jagged_tensor_ops_meta.cpp.o
      __ZN10fbgemm_gpu44batched_dense_vec_jagged_2d_mul_forward_metaERKN2at6TensorES3_S3_ in jagged_tensor_ops_meta.cpp.o
  __ZN5torch25registerCustomClassMethodESt10unique_ptrINS_3jit8FunctionESt14default_deleteIS2_EE, referenced from:
      __ZN5torch6class_I13AtomicCounterE12defineMethodINS_6detail10WrapMethodIMS1_FxvEEEEEPNS_3jit8FunctionESsT_SsSt16initializer_listINS_3argEE.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I11TensorQueueE12defineMethodINS_6detail10WrapMethodIMS1_FN2at6TensorEvEEEEEPNS_3jit8FunctionESsT_SsSt16initializer_listINS_3argEE.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I11TensorQueueE12defineMethodIZNS2_3defIJN2at6TensorEEEERS2_NS_6detail5typesIvJDpT_EEESsSt16initializer_listINS_3argEEEUlN3c1014tagged_capsuleIS1_EES6_E_EEPNS_3jit8FunctionESsT_SsSF_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I13AtomicCounterE12defineMethodIZNS2_3defIJEEERS2_NS_6detail5typesIvJDpT_EEESsSt16initializer_listINS_3argEEEUlN3c1014tagged_capsuleIS1_EEE_EEPNS_3jit8FunctionESsT_SsSD_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I12PrunedMapCPUE12defineMethodIZNS2_3defIJEEERS2_NS_6detail5typesIvJDpT_EEESsSt16initializer_listINS_3argEEEUlN3c1014tagged_capsuleIS1_EEE_EEPNS_3jit8FunctionESsT_SsSD_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I11TensorQueueE10def_pickleINL19TensorQueueRegistryMUlRKN3c1013intrusive_ptrIS1_NS5_6detail34intrusive_target_default_null_typeIS1_EEEEE_ENS4_UlNS5_4DictISsN2at6TensorEEEE_EEERS2_OT_OT0_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I11TensorQueueE10def_pickleINL19TensorQueueRegistryMUlRKN3c1013intrusive_ptrIS1_NS5_6detail34intrusive_target_default_null_typeIS1_EEEEE_ENS4_UlNS5_4DictISsN2at6TensorEEEE_EEERS2_OT_OT0_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      ...
  __ZN5torch3jit11parseSchemaERKSs, referenced from:
      __ZN12_GLOBAL__N_1L36TORCH_LIBRARY_FRAGMENT_init_fbgemm_2ERN5torch7LibraryE in embedding_forward_split_cpu.cpp.o
      __ZN12_GLOBAL__N_1L36TORCH_LIBRARY_FRAGMENT_init_fbgemm_3ERN5torch7LibraryE in embedding_forward_split_cpu.cpp.o
      __ZL36TORCH_LIBRARY_FRAGMENT_init_fbgemm_2RN5torch7LibraryE in embedding_forward_quantized_host_cpu.cpp.o
      __ZL36TORCH_LIBRARY_FRAGMENT_init_fbgemm_2RN5torch7LibraryE in embedding_forward_quantized_host_cpu.cpp.o
      __ZL36TORCH_LIBRARY_FRAGMENT_init_fbgemm_2RN5torch7LibraryE in embedding_forward_quantized_host_cpu.cpp.o
      __ZL36TORCH_LIBRARY_FRAGMENT_init_fbgemm_2RN5torch7LibraryE in embedding_forward_quantized_host_cpu.cpp.o
      __ZL36TORCH_LIBRARY_FRAGMENT_init_fbgemm_2RN5torch7LibraryE in embedding_forward_quantized_host_cpu.cpp.o
      ...
  __ZN5torch6detail10class_baseC2ERKSsS3_SsRKSt9type_infoS6_, referenced from:
      __Z41__static_initialization_and_destruction_0v in embedding_forward_quantized_host_cpu.cpp.o
      __Z41__static_initialization_and_destruction_0v in embedding_forward_quantized_host_cpu.cpp.o
      __Z41__static_initialization_and_destruction_0v in embedding_forward_quantized_host_cpu.cpp.o
  __ZN5torch7LibraryC1ENS0_4KindESsN3c108optionalINS2_11DispatchKeyEEEPKcj, referenced from:
      __ZN5torch6detail16TorchLibraryInitC1ENS_7Library4KindEPFvRS2_EPKcN3c108optionalINS9_11DispatchKeyEEES8_j in embedding_forward_split_cpu.cpp.o
      __Z41__static_initialization_and_destruction_0v in embedding_forward_quantized_host_cpu.cpp.o
      __GLOBAL__sub_I_permute_pooled_embedding_ops_split_cpu.cpp in permute_pooled_embedding_ops_split_cpu.cpp.o
      __GLOBAL__sub_I_jagged_tensor_ops_autograd.cpp in jagged_tensor_ops_autograd.cpp.o
      __GLOBAL__sub_I_jagged_tensor_ops_meta.cpp in jagged_tensor_ops_meta.cpp.o
      __GLOBAL__sub_I_input_combine_cpu.cpp in input_combine_cpu.cpp.o
      __GLOBAL__sub_I_quantize_ops_meta.cpp in quantize_ops_meta.cpp.o
      ...
  __ZN5torch8autograd13_wrap_outputsERKSt6vectorIN2at6TensorESaIS3_EERKSt13unordered_setIPN3c1010TensorImplESt4hashISB_ESt8equal_toISB_ESaISB_EESJ_NS9_8ArrayRefINS9_8optionalIS3_EEEERKSt10shared_ptrINS0_4NodeEESt8functionIFS5_S5_S5_EESJ_, referenced from:
      __ZN3c104impl28wrap_kernel_functor_unboxed_INS0_6detail24WrapFunctionIntoFunctor_INS_26CompileTimeFunctionPointerIFN2at6TensorES6_S6_S6_xxS6_xS6_S6_xNS_8optionalIS6_EES8_xEXadL_ZN12_GLOBAL__N_145split_embedding_codegen_lookup_dense_functionES6_S6_S6_xxS6_xS6_S6_xS8_S8_xEEEES6_NS_4guts8typelist8typelistIJS6_S6_S6_xxS6_xS6_S6_xS8_S8_xEEEEES9_E4callEPNS_14OperatorKernelENS_14DispatchKeySetES6_S6_S6_xxS6_xS6_S6_xS8_S8_x in embedding_backward_dense_host_cpu.cpp.o
      __ZN5torch8autograd8FunctionIN10fbgemm_gpu30PermutePooledEmbsFunctionSplitIXadL_ZNS2_29permute_pooled_embs_split_cpuERKN2at6TensorES7_S7_S7_S7_EEEEE5applyIS8_JS7_S7_S7_S7_S7_EEENSt9enable_ifIXsrSt7is_sameIT_S8_E5valueEDTclsrSD_7forwardLDnEspcl7declvalIT0_EEEEE4typeEDpOSF_ in permute_pooled_embedding_ops_split_cpu.cpp.o
      __ZN5torch8autograd8FunctionIN10fbgemm_gpu12_GLOBAL__N_121JaggedToPaddedDenseOpEE5applyIS4_JRKN2at6TensorERKSt6vectorIS8_SaIS8_EERKN3c108ArrayRefINSG_6SymIntEEERKdEEENSt9enable_ifIXsrSt7is_sameIT_S4_E5valueEDTclsrSQ_7forwardLDnEspcl7declvalIT0_EEEEE4typeEDpOSS_ in jagged_tensor_ops_autograd.cpp.o
      __ZN10fbgemm_gpu28jagged_dense_elementwise_addERKN2at6TensorERKSt6vectorIS1_SaIS1_EES3_ in jagged_tensor_ops_autograd.cpp.o
      __ZN5torch8autograd8FunctionIN10fbgemm_gpu12_GLOBAL__N_133JaggedDenseDenseAddJaggedOutputOpEE5applyIS4_JRKN2at6TensorERKSt6vectorIS8_SaIS8_EESA_SA_EEENSt9enable_ifIXsrSt7is_sameIT_S4_E5valueEDTclsrSI_7forwardLDnEspcl7declvalIT0_EEEEE4typeEDpOSK_ in jagged_tensor_ops_autograd.cpp.o
      __ZN5torch8autograd8FunctionIN10fbgemm_gpu12_GLOBAL__N_116JaggedDenseMulOpEE5applyIS4_JRKN2at6TensorERKSt6vectorIS8_SaIS8_EESA_EEENSt9enable_ifIXsrSt7is_sameIT_S4_E5valueEDTclsrSI_7forwardLDnEspcl7declvalIT0_EEEEE4typeEDpOSK_ in jagged_tensor_ops_autograd.cpp.o
      __ZN5torch8autograd8FunctionIN10fbgemm_gpu12_GLOBAL__N_128BatchedDenseVecJagged2DMulOpEE5applyIS4_JRKN2at6TensorESA_SA_EEENSt9enable_ifIXsrSt7is_sameIT_S4_E5valueEDTclsrSD_7forwardLDnEspcl7declvalIT0_EEEEE4typeEDpOSF_ in jagged_tensor_ops_autograd.cpp.o
      ...
  __ZN5torch8autograd15AutogradContext17save_for_backwardESt6vectorIN2at6TensorESaIS4_EE, referenced from:
      __ZN3c104impl28wrap_kernel_functor_unboxed_INS0_6detail24WrapFunctionIntoFunctor_INS_26CompileTimeFunctionPointerIFN2at6TensorES6_S6_S6_xxS6_xS6_S6_xNS_8optionalIS6_EES8_xEXadL_ZN12_GLOBAL__N_145split_embedding_codegen_lookup_dense_functionES6_S6_S6_xxS6_xS6_S6_xS8_S8_xEEEES6_NS_4guts8typelist8typelistIJS6_S6_S6_xxS6_xS6_S6_xS8_S8_xEEEEES9_E4callEPNS_14OperatorKernelENS_14DispatchKeySetES6_S6_S6_xxS6_xS6_S6_xS8_S8_x in embedding_backward_dense_host_cpu.cpp.o
      __ZN10fbgemm_gpu12_GLOBAL__N_121JaggedToPaddedDenseOp7forwardEPN5torch8autograd15AutogradContextERKN2at6TensorERKSt6vectorIS7_SaIS7_EEN3c108ArrayRefINSF_6SymIntEEEd in jagged_tensor_ops_autograd.cpp.o
      __ZN5torch8autograd8FunctionIN10fbgemm_gpu12_GLOBAL__N_133JaggedDenseDenseAddJaggedOutputOpEE5applyIS4_JRKN2at6TensorERKSt6vectorIS8_SaIS8_EESA_SA_EEENSt9enable_ifIXsrSt7is_sameIT_S4_E5valueEDTclsrSI_7forwardLDnEspcl7declvalIT0_EEEEE4typeEDpOSK_ in jagged_tensor_ops_autograd.cpp.o
      __ZN5torch8autograd8FunctionIN10fbgemm_gpu12_GLOBAL__N_116JaggedDenseMulOpEE5applyIS4_JRKN2at6TensorERKSt6vectorIS8_SaIS8_EESA_EEENSt9enable_ifIXsrSt7is_sameIT_S4_E5valueEDTclsrSI_7forwardLDnEspcl7declvalIT0_EEEEE4typeEDpOSK_ in jagged_tensor_ops_autograd.cpp.o
      __ZN5torch8autograd8FunctionIN10fbgemm_gpu12_GLOBAL__N_128BatchedDenseVecJagged2DMulOpEE5applyIS4_JRKN2at6TensorESA_SA_EEENSt9enable_ifIXsrSt7is_sameIT_S4_E5valueEDTclsrSD_7forwardLDnEspcl7declvalIT0_EEEEE4typeEDpOSF_ in jagged_tensor_ops_autograd.cpp.o
      __ZN10fbgemm_gpu12_GLOBAL__N_115DenseToJaggedOp7forwardEPN5torch8autograd15AutogradContextERKN2at6TensorERKSt6vectorIS7_SaIS7_EERKN3c108optionalINSF_6SymIntEEE in jagged_tensor_ops_autograd.cpp.o
      __ZN5torch8autograd8FunctionIN10fbgemm_gpu12_GLOBAL__N_117JaggedJaggedBmmOpEE5applyIS4_JRKN2at6TensorESA_SA_RKxEEENSt9enable_ifIXsrSt7is_sameIT_S4_E5valueEDTclsrSF_7forwardLDnEspcl7declvalIT0_EEEEE4typeEDpOSH_ in jagged_tensor_ops_autograd.cpp.o
      ...
  __ZN5torch9serialize12InputArchive4readERKSsRN2at6TensorEb, referenced from:
      __ZN12PrunedMapCPUC1ESs in embedding_forward_quantized_host_cpu.cpp.o
      __ZN12PrunedMapCPUC1ESs in embedding_forward_quantized_host_cpu.cpp.o
  __ZN5torch9serialize13OutputArchive5writeERKSsRKN2at6TensorEb, referenced from:
      __ZNK12PrunedMapCPU9serializeEv in embedding_forward_quantized_host_cpu.cpp.o
      __ZNK12PrunedMapCPU9serializeEv in embedding_forward_quantized_host_cpu.cpp.o
  __ZN5torch9serialize13OutputArchive7save_toERSo, referenced from:
      __ZNK12PrunedMapCPU9serializeEv in embedding_forward_quantized_host_cpu.cpp.o
  __ZN5torch9serialize13OutputArchiveC1ESt10shared_ptrINS_3jit15CompilationUnitEE, referenced from:
      __ZNK12PrunedMapCPU9serializeEv in embedding_forward_quantized_host_cpu.cpp.o
  __ZNK3c104Type14isSubtypeOfExtERKS0_PSo, referenced from:
      __ZTVN3c1010SharedTypeE in jagged_tensor_ops_autograd.cpp.o
      __ZTVN3c1017SingleElementTypeILNS_8TypeKindE6ENS_8ListTypeEEE in jagged_tensor_ops_autograd.cpp.o
  __ZNK3c109ClassType9getMethodERKSs, referenced from:
      __ZN5torch6class_I11TensorQueueE10def_pickleINL19TensorQueueRegistryMUlRKN3c1013intrusive_ptrIS1_NS5_6detail34intrusive_target_default_null_typeIS1_EEEEE_ENS4_UlNS5_4DictISsN2at6TensorEEEE_EEERS2_OT_OT0_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I11TensorQueueE10def_pickleINL19TensorQueueRegistryMUlRKN3c1013intrusive_ptrIS1_NS5_6detail34intrusive_target_default_null_typeIS1_EEEEE_ENS4_UlNS5_4DictISsN2at6TensorEEEE_EEERS2_OT_OT0_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I13AtomicCounterE10def_pickleINL21AtomicCounterRegistryMUlRKN3c1013intrusive_ptrIS1_NS5_6detail34intrusive_target_default_null_typeIS1_EEEEE_ENS4_UlSsE_EEERS2_OT_OT0_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I13AtomicCounterE10def_pickleINL21AtomicCounterRegistryMUlRKN3c1013intrusive_ptrIS1_NS5_6detail34intrusive_target_default_null_typeIS1_EEEEE_ENS4_UlSsE_EEERS2_OT_OT0_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I12PrunedMapCPUE10def_pickleINL20PrunedMapCPURegistryMUlRKN3c1013intrusive_ptrIS1_NS5_6detail34intrusive_target_default_null_typeIS1_EEEEE_ENS4_UlSsE_EEERS2_OT_OT0_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I12PrunedMapCPUE10def_pickleINL20PrunedMapCPURegistryMUlRKN3c1013intrusive_ptrIS1_NS5_6detail34intrusive_target_default_null_typeIS1_EEEEE_ENS4_UlSsE_EEERS2_OT_OT0_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
  __ZNR5torch7Library4_defEON3c1014FunctionSchemaEPNS1_12OperatorNameERKSt6vectorIN2at3TagESaIS8_EENS_17_RegisterOrVerifyE, referenced from:
      __ZN12_GLOBAL__N_1L36TORCH_LIBRARY_FRAGMENT_init_fbgemm_2ERN5torch7LibraryE in embedding_forward_split_cpu.cpp.o
      __ZN12_GLOBAL__N_1L36TORCH_LIBRARY_FRAGMENT_init_fbgemm_3ERN5torch7LibraryE in embedding_forward_split_cpu.cpp.o
      __ZL36TORCH_LIBRARY_FRAGMENT_init_fbgemm_2RN5torch7LibraryE in embedding_forward_quantized_host_cpu.cpp.o
      __ZL36TORCH_LIBRARY_FRAGMENT_init_fbgemm_2RN5torch7LibraryE in embedding_forward_quantized_host_cpu.cpp.o
      __ZL36TORCH_LIBRARY_FRAGMENT_init_fbgemm_2RN5torch7LibraryE in embedding_forward_quantized_host_cpu.cpp.o
      __ZL36TORCH_LIBRARY_FRAGMENT_init_fbgemm_2RN5torch7LibraryE in embedding_forward_quantized_host_cpu.cpp.o
      __ZL36TORCH_LIBRARY_FRAGMENT_init_fbgemm_2RN5torch7LibraryE in embedding_forward_quantized_host_cpu.cpp.o
      ...
  ___emutls_v._ZN3c104impl26raw_local_dispatch_key_setE, referenced from:
      __ZNK3c1020DispatchKeyExtractor24getDispatchKeySetUnboxedIJRKN2at6TensorES5_S5_xEEENS_14DispatchKeySetEDpRKT_.isra.0 in jagged_tensor_ops_autograd.cpp.o
      __ZN5torch8autograd7CppNodeIN10fbgemm_gpu12_GLOBAL__N_121JaggedToPaddedDenseOpEE5applyEOSt6vectorIN2at6TensorESaIS8_EE in jagged_tensor_ops_autograd.cpp.o
      __ZN10fbgemm_gpu12_GLOBAL__N_121JaggedToPaddedDenseOp7forwardEPN5torch8autograd15AutogradContextERKN2at6TensorERKSt6vectorIS7_SaIS7_EEN3c108ArrayRefINSF_6SymIntEEEd in jagged_tensor_ops_autograd.cpp.o
      __ZN5torch8autograd7CppNodeIN10fbgemm_gpu12_GLOBAL__N_133JaggedDenseDenseAddJaggedOutputOpEE5applyEOSt6vectorIN2at6TensorESaIS8_EE in jagged_tensor_ops_autograd.cpp.o
      __ZN5torch8autograd7CppNodeIN10fbgemm_gpu12_GLOBAL__N_115DenseToJaggedOpEE5applyEOSt6vectorIN2at6TensorESaIS8_EE in jagged_tensor_ops_autograd.cpp.o
      __ZN5torch8autograd8FunctionIN10fbgemm_gpu12_GLOBAL__N_133JaggedDenseDenseAddJaggedOutputOpEE5applyIS4_JRKN2at6TensorERKSt6vectorIS8_SaIS8_EESA_SA_EEENSt9enable_ifIXsrSt7is_sameIT_S4_E5valueEDTclsrSI_7forwardLDnEspcl7declvalIT0_EEEEE4typeEDpOSK_ in jagged_tensor_ops_autograd.cpp.o
      __ZN5torch8autograd7CppNodeIN10fbgemm_gpu12_GLOBAL__N_116JaggedDenseMulOpEE5applyEOSt6vectorIN2at6TensorESaIS8_EE in jagged_tensor_ops_autograd.cpp.o
      ...
  _omp_get_max_threads, referenced from:
      __ZN8internal7csr2cscIfEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS3_16DefaultPtrTraitsExEES8_RKNS4_IT_Lm1ES5_xEExPKix in embedding_forward_split_cpu.cpp.o
      __ZN8internal7csr2cscIfEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS3_16DefaultPtrTraitsExEES8_RKNS4_IT_Lm1ES5_xEExPKix in embedding_forward_split_cpu.cpp.o
      __ZN8internal7csr2cscIdEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS3_16DefaultPtrTraitsExEES8_RKNS4_IT_Lm1ES5_xEExPKix in embedding_forward_split_cpu.cpp.o
      __ZN8internal7csr2cscIdEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS3_16DefaultPtrTraitsExEES8_RKNS4_IT_Lm1ES5_xEExPKix in embedding_forward_split_cpu.cpp.o
  _omp_get_num_threads, referenced from:
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb1EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.0 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb1EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.1 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb1EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.2 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb0EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.0 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb0EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.1 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb0EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.2 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IdLb1EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.0 in embedding_forward_split_cpu.cpp.o
      ...
  _omp_get_thread_num, referenced from:
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb1EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.0 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb1EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.1 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb1EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.2 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb0EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.0 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb0EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.1 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb0EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.2 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IdLb1EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.0 in embedding_forward_split_cpu.cpp.o
      ...
  _omp_set_num_threads, referenced from:
      __ZN10fbgemm_gpu22jagged_softmax_forwardERKN2at6TensorES3_x in jagged_tensor_ops_cpu.cpp.o
      __ZN10fbgemm_gpu23jagged_softmax_backwardERKN2at6TensorES3_S3_x in jagged_tensor_ops_cpu.cpp.o
      __ZN10fbgemm_gpu25jagged_jagged_bmm_forwardERKN2at6TensorES3_S3_x in jagged_tensor_ops_cpu.cpp.o
      __ZN10fbgemm_gpu24jagged_dense_bmm_forwardERKN2at6TensorES3_S3_x in jagged_tensor_ops_cpu.cpp.o
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/skbuild/setuptools_wrap.py", line 674, in setup
    cmkr.make(make_args, install_target=cmake_install_target, env=env)
  File "/Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/skbuild/cmaker.py", line 697, in make
    self.make_impl(clargs=clargs, config=config, source_dir=source_dir, install_target=install_target, env=env)
  File "/Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/skbuild/cmaker.py", line 742, in make_impl
    raise SKBuildError(msg)

An error occurred while building with CMake.
  Command:
    /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/cmake/data/bin/cmake --build . --target install --config Release --
  Install target:
    install
  Source directory:
    /Users/xshan/pubrepo/fbgemm/fbgemm_gpu
  Working directory:
    /Users/xshan/pubrepo/fbgemm/fbgemm_gpu/_skbuild/macosx-13.6-x86_64-3.10/cmake-build
Please check the install target is valid and see CMake's output for more information.