pytorch / torchrec

Pytorch domain library for recommendation systems
https://pytorch.org/torchrec/
BSD 3-Clause "New" or "Revised" License
1.96k stars 441 forks source link

torchrec Build inference library and example server failure #2486

Open Chevolier opened 1 month ago

Chevolier commented 1 month ago

I followed the steps in https://github.com/pytorch/torchrec/tree/main/torchrec/inference to test inference. But in 4. Build inference library and example server, the Build server and C++ protobufs failed. In particular, after I input

cmake -S . -B build/ -DCMAKE_PREFIX_PATH="$(python -c 'import torch.utils; print(torch.utils.cmake_prefix_path)');" -DFBGEMM_LIB="$FBGEMM_LIB"

cd build
make -j

The following error occured: make -j [ 16%] Generating predictor.pb.cc, predictor.pb.h, predictor.grpc.pb.cc, predictor.grpc.pb.h [ 50%] Building CXX object CMakeFiles/hw_grpc_proto.dir/predictor.grpc.pb.cc.o [ 50%] Building CXX object CMakeFiles/hw_grpc_proto.dir/predictor.pb.cc.o [ 66%] Linking CXX static library libhw_grpc_proto.a [ 66%] Built target hw_grpc_proto [ 83%] Building CXX object CMakeFiles/server.dir/server.cpp.o /home/ec2-user/SageMaker/efs/Projects/torchrec-mob/zhangchg/torchrec/inference/server.cpp: In member function ‘virtual grpc::Status PredictorServiceHandler::Predict(grpc::ServerContext, const predictor::PredictionRequest, predictor::PredictionResponse)’: /home/ec2-user/SageMaker/efs/Projects/torchrec-mob/zhangchg/torchrec/inference/server.cpp:67:10: warning: narrowing conversion of ‘batchSize’ from ‘long unsigned int’ to ‘long int’ [-Wnarrowing] 67 | {batchSize, numFloatFeatures}, | ^~~~~ /home/ec2-user/SageMaker/efs/Projects/torchrec-mob/zhangchg/torchrec/inference/server.cpp:67:10: warning: narrowing conversion of ‘batchSize’ from ‘long unsigned int’ to ‘long int’ [-Wnarrowing] /home/ec2-user/SageMaker/efs/Projects/torchrec-mob/zhangchg/torchrec/inference/server.cpp:83:29: warning: narrowing conversion of ‘(lengthsBlob.std::basic_string::size() / 4)’ from ‘std::basic_string::size_type’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] 83 | {lengthsBlob.size() / NUM_BYTES_SPARSE_FEATURES}, | ^ /home/ec2-user/SageMaker/efs/Projects/torchrec-mob/zhangchg/torchrec/inference/server.cpp:83:29: warning: narrowing conversion of ‘(lengthsBlob.std::basic_string::size() / 4)’ from ‘std::basic_string::size_type’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] /home/ec2-user/SageMaker/efs/Projects/torchrec-mob/zhangchg/torchrec/inference/server.cpp:87:28: warning: narrowing conversion of ‘(valuesBlob.std::basic_string::size() / 4)’ from ‘std::basic_string::size_type’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] 87 | {valuesBlob.size() / NUM_BYTES_SPARSE_FEATURES}, | ^ /home/ec2-user/SageMaker/efs/Projects/torchrec-mob/zhangchg/torchrec/inference/server.cpp:87:28: warning: narrowing conversion of ‘(valuesBlob.std::basic_string::size() / 4)’ from ‘std::basic_string::size_type’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] [100%] Linking CXX executable server /usr/bin/ld: CMakeFiles/server.dir/server.cpp.o: in function RunServer(unsigned short, torch::jit::Module&)': server.cpp:(.text+0x3a9): undefined reference togrpc::ServerBuilder::AddListeningPort(std::string const&, std::shared_ptr, int)' /usr/bin/ld: CMakeFiles/server.dir/server.cpp.o: in function std::string* absl::lts_20240116::log_internal::MakeCheckOpString<unsigned long const&, unsigned long const&>(unsigned long const&, unsigned long const&, char const*)': server.cpp:(.text._ZN4absl12lts_2024011612log_internal17MakeCheckOpStringIRKmS4_EEPSsT_T0_PKc[_ZN4absl12lts_2024011612log_internal17MakeCheckOpStringIRKmS4_EEPSsT_T0_PKc]+0xa0): undefined reference toabsl::lts_20240116::log_internal::CheckOpMessageBuilder::NewString()' /usr/bin/ld: CMakeFiles/server.dir/server.cpp.o: in function std::string* absl::lts_20240116::log_internal::MakeCheckOpString<unsigned long const&, unsigned int const&>(unsigned long const&, unsigned int const&, char const*)': server.cpp:(.text._ZN4absl12lts_2024011612log_internal17MakeCheckOpStringIRKmRKjEEPSsT_T0_PKc[_ZN4absl12lts_2024011612log_internal17MakeCheckOpStringIRKmRKjEEPSsT_T0_PKc]+0xa0): undefined reference toabsl::lts_20240116::log_internal::CheckOpMessageBuilder::NewString()' /usr/bin/ld: CMakeFiles/server.dir/server.cpp.o: in function std::string* absl::lts_20240116::log_internal::MakeCheckOpString<unsigned int const&, unsigned int const&>(unsigned int const&, unsigned int const&, char const*)': server.cpp:(.text._ZN4absl12lts_2024011612log_internal17MakeCheckOpStringIRKjS4_EEPSsT_T0_PKc[_ZN4absl12lts_2024011612log_internal17MakeCheckOpStringIRKjS4_EEPSsT_T0_PKc]+0xa0): undefined reference toabsl::lts_20240116::log_internal::CheckOpMessageBuilder::NewString()' /usr/bin/ld: CMakeFiles/server.dir/server.cpp.o: in function std::string absl::lts_20240116::StrFormat<unsigned short>(absl::lts_20240116::str_format_internal::FormatSpecTemplate<(ArgumentToConv<unsigned short>)()> const&, unsigned short const&)': server.cpp:(.text._ZN4absl12lts_202401169StrFormatIJtEEESsRKNS0_19str_format_internal18FormatSpecTemplateIJXspcl14ArgumentToConvIT_EEEEEEDpRKS4_[_ZN4absl12lts_202401169StrFormatIJtEEESsRKNS0_19str_format_internal18FormatSpecTemplateIJXspcl14ArgumentToConvIT_EEEEEEDpRKS4_]+0x90): undefined reference toabsl::lts_20240116::str_format_internal::FormatPack(absl::lts_20240116::str_format_internal::UntypedFormatSpecImpl, absl::lts_20240116::Span)' /usr/bin/ld: CMakeFiles/server.dir/server.cpp.o: in function std::string* absl::lts_20240116::log_internal::MakeCheckOpString<int const&, int const&>(int const&, int const&, char const*)': server.cpp:(.text._ZN4absl12lts_2024011612log_internal17MakeCheckOpStringIRKiS4_EEPSsT_T0_PKc[_ZN4absl12lts_2024011612log_internal17MakeCheckOpStringIRKiS4_EEPSsT_T0_PKc]+0xa0): undefined reference toabsl::lts_20240116::log_internal::CheckOpMessageBuilder::NewString()' /usr/bin/ld: CMakeFiles/server.dir/server.cpp.o: in function std::string absl::lts_20240116::UnparseFlag<unsigned short>(unsigned short const&)': server.cpp:(.text._ZN4absl12lts_2024011611UnparseFlagItEESsRKT_[_ZN4absl12lts_2024011611UnparseFlagItEESsRKT_]+0x37): undefined reference toabsl::lts_20240116::flags_internal::Unparse(unsigned short)' /usr/bin/ld: CMakeFiles/server.dir/server.cpp.o: in function bool absl::lts_20240116::flags_internal::InvokeParseFlag<unsigned short>(absl::lts_20240116::string_view, unsigned short*, std::string*)': server.cpp:(.text._ZN4absl12lts_2024011614flags_internal15InvokeParseFlagItEEbNS0_11string_viewEPT_PSs[_ZN4absl12lts_2024011614flags_internal15InvokeParseFlagItEEbNS0_11string_viewEPT_PSs]+0x42): undefined reference toabsl::lts_20240116::flags_internal::AbslParseFlag(absl::lts_20240116::string_view, unsigned short, std::string)' /usr/bin/ld: CMakeFiles/server.dir/server.cpp.o: in function std::string* absl::lts_20240116::log_internal::MakeCheckOpString<unsigned int const&, google::protobuf::internal::UntypedMapBase::{unnamed type#1} const&>(unsigned int const&, google::protobuf::internal::UntypedMapBase::{unnamed type#1} const&, char const*)': server.cpp:(.text._ZN4absl12lts_2024011612log_internal17MakeCheckOpStringIRKjRKN6google8protobuf8internal14UntypedMapBaseUt_EEEPSsT_T0_PKc[_ZN4absl12lts_2024011612log_internal17MakeCheckOpStringIRKjRKN6google8protobuf8internal14UntypedMapBaseUt_EEEPSsT_T0_PKc]+0xa0): undefined reference toabsl::lts_20240116::log_internal::CheckOpMessageBuilder::NewString()' collect2: error: ld returned 1 exit status make[2]: [CMakeFiles/server.dir/build.make:209: server] Error 1 make[1]: [CMakeFiles/Makefile2:111: CMakeFiles/server.dir/all] Error 2 make: *** [Makefile:91: all] Error 2

The system is Ubuntu 20.04, g++ 9.4.0, protoc used the installed version in /home/ubuntu/.local/bin/protoc. Your help is much appreciated.

tiankongdeguiji commented 1 month ago

You could explore TorchEasyRec and its inference service here. TorchEasyRec has further enhanced performance optimizations for inference based on TorchRec.

Chevolier commented 1 month ago

Much thanks, I'll take a look at it!

You could explore TorchEasyRec and its inference service here. TorchEasyRec has further enhanced performance optimizations for inference based on TorchRec.

Chevolier commented 1 week ago

Still waiting for solutions ...