tensorflow / io

Dataset, streaming, and file system extensions maintained by TensorFlow SIG-IO
Apache License 2.0
700 stars 281 forks source link

Use rules_foreign_cc for CMake projects import if possible #933

Open yongtang opened 4 years ago

yongtang commented 4 years ago

As tensorflow/io uses Bazel to conform to TF's build system, and tensorflow/io has many projects dependencies (that uses CMake), in the past we have been manually converting CMake project to Bazel. This is really time-consuming and not very scalable.

It looks like Bazel has a project that might import CMake:

https://github.com/bazelbuild/rules_foreign_cc

We can explore and see if this fits out needs in importing CMake projects into tensorflow/io.

oliverhu commented 3 years ago

@yongtang planning to use rules_foreign_cc to build Apache ORC support. I created a sample commit here: https://github.com/oliverhu/io/commit/dec68b30e09a2c933ef918e5b8ad2fdd2418887e , any comments?

yongtang commented 3 years ago

@oliverhu Looks good! PR is definitely welcomed!

oliverhu commented 3 years ago

hey @yongtang @terrytangyuan , need some help, I spent a few hours scratching my head trying to understand this...

I created a sample project here: https://github.com/oliverhu/orc_bazel and ran this command bazel build //demo:hello-time, success.

I copied the same files into tf/io https://github.com/oliverhu/io/commit/96e612cbc288881c31097eabbe320aa4e777013e , ran the same command bazel build //demo:hello-time, linker throws error when links orc files against protobuf library. any idea?

[ 85%] Linking CXX static library liborc.a
[ 85%] Built target orc
Scanning dependencies of target orc-statistics
[ 86%] Building CXX object tools/src/CMakeFiles/orc-statistics.dir/FileStatistics.cc.o
[ 87%] Linking CXX executable orc-statistics
../../c++/src/liborc.a(Reader.cc.o):Reader.cc:function orc::ReaderImpl::getSerializedFileTail() const: error: undefined reference to 'google::protobuf::MessageLite::SerializeToString(std::string*) const'
../../c++/src/liborc.a(Reader.cc.o):Reader.cc:function orc::createReader(std::unique_ptr<orc::InputStream, std::default_delete<orc::InputStream> >, orc::ReaderOptions const&): error: undefined reference to 'google::protobuf::MessageLite::ParseFromString(std::string const&)'
../../c++/src/liborc.a(Statistics.cc.o):Statistics.cc:function non-virtual thunk to orc::StringColumnStatisticsImpl::toProtoBuf(orc::proto::ColumnStatistics&) const: error: undefined reference to 'google::protobuf::internal::fixed_address_empty_string'
../../c++/src/liborc.a(Statistics.cc.o):Statistics.cc:function non-virtual thunk to orc::StringColumnStatisticsImpl::toProtoBuf(orc::proto::ColumnStatistics&) const: error: undefined reference to 'google::protobuf::internal::fixed_address_empty_string'
../../c++/src/liborc.a(Statistics.cc.o):Statistics.cc:function orc::StringColumnStatisticsImpl::toProtoBuf(orc::proto::ColumnStatistics&) const: error: undefined reference to 'google::protobuf::internal::fixed_address_empty_string'
../../c++/src/liborc.a(Statistics.cc.o):Statistics.cc:function orc::StringColumnStatisticsImpl::toProtoBuf(orc::proto::ColumnStatistics&) const: error: undefined reference to 'google::protobuf::internal::fixed_address_empty_string'
../../c++/src/liborc.a(OrcHdfsFile.cc.o):OrcHdfsFile.cc:function orc::HdfsFileInputStream::read(void*, unsigned long, unsigned long): error: undefined reference to 'hdfs::Status::ToString() const'
../../c++/src/liborc.a(OrcHdfsFile.cc.o):OrcHdfsFile.cc:function orc::HdfsFileInputStream::HdfsFileInputStream(std::string): error: undefined reference to 'hdfs::URI::parse_from_string(std::string const&)'
../../c++/src/liborc.a(OrcHdfsFile.cc.o):OrcHdfsFile.cc:function orc::HdfsFileInputStream::HdfsFileInputStream(std::string): error: undefined reference to 'hdfs::ConfigParser::ValidateResources() const'
../../c++/src/liborc.a(OrcHdfsFile.cc.o):OrcHdfsFile.cc:function orc::HdfsFileInputStream::HdfsFileInputStream(std::string): error: undefined reference to 'hdfs::FileSystem::New(hdfs::IoService*&, std::string const&, hdfs::Options const&)'
../../c++/src/liborc.a(OrcHdfsFile.cc.o):OrcHdfsFile.cc:function orc::HdfsFileInputStream::HdfsFileInputStream(std::string): error: undefined reference to 'hdfs::URI::get_host(bool) const'
../../c++/src/liborc.a(OrcHdfsFile.cc.o):OrcHdfsFile.cc:function orc::HdfsFileInputStream::HdfsFileInputStream(std::string): error: undefined reference to 'hdfs::URI::get_path(bool) const'
../../c++/src/liborc.a(OrcHdfsFile.cc.o):OrcHdfsFile.cc:function orc::HdfsFileInputStream::HdfsFileInputStream(std::string): error: undefined reference to 'hdfs::URI::get_path(bool) const'
../../c++/src/liborc.a(OrcHdfsFile.cc.o):OrcHdfsFile.cc:function orc::HdfsFileInputStream::HdfsFileInputStream(std::string): error: undefined reference to 'hdfs::URI::get_host(bool) const'
../../c++/src/liborc.a(OrcHdfsFile.cc.o):OrcHdfsFile.cc:function orc::HdfsFileInputStream::HdfsFileInputStream(std::string): error: undefined reference to 'hdfs::Status::ToString() const'
../../c++/src/liborc.a(OrcHdfsFile.cc.o):OrcHdfsFile.cc:function orc::HdfsFileInputStream::HdfsFileInputStream(std::string): error: undefined reference to 'hdfs::URI::str(bool) const'
../../c++/src/liborc.a(OrcHdfsFile.cc.o):OrcHdfsFile.cc:function orc::HdfsFileInputStream::HdfsFileInputStream(std::string): error: undefined reference to 'hdfs::URI::get_host(bool) const'
../../c++/src/liborc.a(OrcHdfsFile.cc.o):OrcHdfsFile.cc:function orc::HdfsFileInputStream::HdfsFileInputStream(std::string): error: undefined reference to 'hdfs::Status::ToString() const'
../../c++/src/liborc.a(OrcHdfsFile.cc.o):OrcHdfsFile.cc:function orc::HdfsFileInputStream::HdfsFileInputStream(std::string): error: undefined reference to 'hdfs::URI::get_path(bool) const'
../../c++/src/liborc.a(OrcHdfsFile.cc.o):OrcHdfsFile.cc:function orc::HdfsFileInputStream::HdfsFileInputStream(std::string): error: undefined reference to 'hdfs::Status::ToString() const'
../../c++/src/liborc.a(OrcHdfsFile.cc.o):OrcHdfsFile.cc:function orc::HdfsFileInputStream::HdfsFileInputStream(std::string): error: undefined reference to 'hdfs::URI::get_host(bool) const'
../../c++/src/liborc.a(OrcHdfsFile.cc.o):OrcHdfsFile.cc:function orc::HdfsFileInputStream::HdfsFileInputStream(std::string): error: undefined reference to 'hdfs::URI::get_path(bool) const'
../../c++/src/liborc.a(orc-proto-wrapper.cc.o):orc-proto-wrapper.cc:function orc::proto::StringStatistics::SerializeWithCachedSizes(google::protobuf::io::CodedOutputStream*) const: error: undefined reference to 'google::protobuf::internal::WireFormatLite::WriteStringMaybeAliased(int, std::string const&, google::protobuf::io::CodedOutputStream*)'
../../c++/src/liborc.a(orc-proto-wrapper.cc.o):orc-proto-wrapper.cc:function orc::proto::StringStatistics::SerializeWithCachedSizes(google::protobuf::io::CodedOutputStream*) const: error: undefined reference to 'google::protobuf::internal::WireFormatLite::WriteStringMaybeAliased(int, std::string const&, google::protobuf::io::CodedOutputStream*)'
../../c++/src/liborc.a(orc-proto-wrapper.cc.o):orc-proto-wrapper.cc:function orc::proto::StringStatistics::SerializeWithCachedSizes(google::protobuf::io::CodedOutputStream*) const: error: undefined reference to 'google::protobuf::internal::WireFormatLite::WriteStringMaybeAliased(int, std::string const&, google::protobuf::io::CodedOutputStream*)'
../../c++/src/liborc.a(orc-proto-wrapper.cc.o):orc-proto-wrapper.cc:function orc::proto::StringStatistics::SerializeWithCachedSizes(google::protobuf::io::CodedOutputStream*) const: error: undefined reference to 'google::protobuf::internal::WireFormatLite::WriteStringMaybeAliased(int, std::string const&, google::protobuf::io::CodedOutputStream*)'
../../c++/src/liborc.a(orc-proto-wrapper.cc.o):orc-proto-wrapper.cc:function orc::proto::StringStatistics::InternalSerializeWithCachedSizesToArray(bool, unsigned char*) const: error: undefined reference to 'google::protobuf::io::CodedOutputStream::WriteStringWithSizeToArray(std::string const&, unsigned char*)'
../../c++/src/liborc.a(orc-proto-wrapper.cc.o):orc-proto-wrapper.cc:function orc::proto::StringStatistics::InternalSerializeWithCachedSizesToArray(bool, unsigned char*) const: error: undefined reference to 'google::protobuf::io::CodedOutputStream::WriteStringWithSizeToArray(std::string const&, unsigned char*)'
../../c++/src/liborc.a(orc-proto-wrapper.cc.o):orc-proto-wrapper.cc:function orc::proto::StringStatistics::InternalSerializeWithCachedSizesToArray(bool, unsigned char*) const: error: undefined reference to 'google::protobuf::io::CodedOutputStream::WriteStringWithSizeToArray(std::string const&, unsigned char*)'
../../c++/src/liborc.a(orc-proto-wrapper.cc.o):orc-proto-wrapper.cc:function orc::proto::StringStatistics::InternalSerializeWithCachedSizesToArray(bool, unsigned char*) const: error: undefined reference to 'google::protobuf::io::CodedOutputStream::WriteStringWithSizeToArray(std::string const&, unsigned char*)'
../../c++/src/liborc.a(orc-proto-wrapper.cc.o):orc-proto-wrapper.cc:function orc::proto::BloomFilter::SerializeWithCachedSizes(google::protobuf::io::CodedOutputStream*) const: error: undefined reference to 'google::protobuf::internal::WireFormatLite::WriteBytesMaybeAliased(int, std::string const&, google::protobuf::io::CodedOutputStream*)'
../../c++/src/liborc.a(orc-proto-wrapper.cc.o):orc-proto-wrapper.cc:function orc::proto::UserMetadataItem::SerializeWithCachedSizes(google::protobuf::io::CodedOutputStream*) const: error: undefined reference to 'google::protobuf::internal::WireFormatLite::WriteBytesMaybeAliased(int, std::string const&, google::protobuf::io::CodedOutputStream*)'
../../c++/src/liborc.a(orc-proto-wrapper.cc.o):orc-proto-wrapper.cc:function orc::proto::EncryptionVariant::SerializeWithCachedSizes(google::protobuf::io::CodedOutputStream*) const: error: undefined reference to 'google::protobuf::internal::WireFormatLite::WriteBytesMaybeAliased(int, std::string const&, google::protobuf::io::CodedOutputStream*)'
../../c++/src/liborc.a(orc-proto-wrapper.cc.o):orc-proto-wrapper.cc:function orc::proto::EncryptionVariant::SerializeWithCachedSizes(google::protobuf::io::CodedOutputStream*) const: error: undefined reference to 'google::protobuf::internal::WireFormatLite::WriteBytesMaybeAliased(int, std::string const&, google::protobuf::io::CodedOutputStream*)'
../../c++/src/liborc.a(orc-proto-wrapper.cc.o):orc-proto-wrapper.cc:function orc::proto::StripeInformation::SerializeWithCachedSizes(google::protobuf::io::CodedOutputStream*) const: error: undefined reference to 'google::protobuf::internal::WireFormatLite::WriteBytes(int, std::string const&, google::protobuf::io::CodedOutputStream*)'
../../c++/src/liborc.a(orc-proto-wrapper.cc.o):orc-proto-wrapper.cc:function orc::proto::DataMask::SerializeWithCachedSizes(google::protobuf::io::CodedOutputStream*) const: error: undefined reference to 'google::protobuf::internal::WireFormatLite::WriteString(int, std::string const&, google::protobuf::io::CodedOutputStream*)'
../../c++/src/liborc.a(orc-proto-wrapper.cc.o):orc-proto-wrapper.cc:function protobuf_orc_5fproto_2eproto::AddDescriptorsImpl(): error: undefined reference to 'google::protobuf::MessageFactory::InternalRegisterGeneratedFile(char const*, void (*)(std::string const&))'
../../c++/src/liborc.a(orc-proto-wrapper.cc.o):orc-proto-wrapper.cc:function orc::proto::Type::SerializeWithCachedSizes(google::protobuf::io::CodedOutputStream*) const: error: undefined reference to 'google::protobuf::internal::WireFormatLite::WriteString(int, std::string const&, google::protobuf::io::CodedOutputStream*)'
../../c++/src/liborc.a(orc-proto-wrapper.cc.o):orc-proto-wrapper.cc:function protobuf_orc_5fproto_2eproto::protobuf_AssignDescriptors(): error: undefined reference to 'google::protobuf::internal::AssignDescriptors(std::string const&, google::protobuf::internal::MigrationSchema const*, google::protobuf::Message const* const*, unsigned int const*, google::protobuf::MessageFactory*, google::protobuf::Metadata*, google::protobuf::EnumDescriptor const**, google::protobuf::ServiceDescriptor const**)'
../../c++/src/liborc.a(orc-proto-wrapper.cc.o):orc-proto-wrapper.cc:function orc::proto::StringStatistics::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*): error: undefined reference to 'google::protobuf::internal::WireFormatLite::ReadBytes(google::protobuf::io::CodedInputStream*, std::string*)'
../../c++/src/liborc.a(orc-proto-wrapper.cc.o):orc-proto-wrapper.cc:function orc::proto::DecimalStatistics::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*): error: undefined reference to 'google::protobuf::internal::WireFormatLite::ReadBytes(google::protobuf::io::CodedInputStream*, std::string*)'
../../c++/src/liborc.a(orc-proto-wrapper.cc.o):orc-proto-wrapper.cc:function orc::proto::StringPair::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*): error: undefined reference to 'google::protobuf::internal::WireFormatLite::ReadBytes(google::protobuf::io::CodedInputStream*, std::string*)'
../../c++/src/liborc.a(orc-proto-wrapper.cc.o):orc-proto-wrapper.cc:function orc::proto::UserMetadataItem::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*): error: undefined reference to 'google::protobuf::internal::WireFormatLite::ReadBytes(google::protobuf::io::CodedInputStream*, std::string*)'
../../c++/src/liborc.a(orc-proto-wrapper.cc.o):orc-proto-wrapper.cc:vtable for orc::proto::IntegerStatistics: error: undefined reference to 'google::protobuf::Message::GetTypeName() const'
../../c++/src/liborc.a(orc-proto-wrapper.cc.o):orc-proto-wrapper.cc:vtable for orc::proto::IntegerStatistics: error: undefined reference to 'google::protobuf::Message::InitializationErrorString() const'
../../c++/src/liborc.a(orc-proto-wrapper.cc.o):orc-proto-wrapper.cc:vtable for orc::proto::DoubleStatistics: error: undefined reference to 'google::protobuf::Message::GetTypeName() const'
../../c++/src/liborc.a(orc-proto-wrapper.cc.o):orc-proto-wrapper.cc:vtable for orc::proto::DoubleStatistics: error: undefined reference to 'google::protobuf::Message::InitializationErrorString() const'
../../c++/src/liborc.a(orc-proto-wrapper.cc.o):orc-proto-wrapper.cc:vtable for orc::proto::StringStatistics: error: undefined reference to 'google::protobuf::Message::GetTypeName() const'
../../c++/src/liborc.a(orc-proto-wrapper.cc.o):orc-proto-wrapper.cc:vtable for orc::proto::StringStatistics: error: undefined reference to 'google::protobuf::Message::InitializationErrorString() const'
../../c++/src/liborc.a(orc-proto-wrapper.cc.o):orc-proto-wrapper.cc:vtable for orc::proto::BucketStatistics: error: undefined reference to 'google::protobuf::Message::GetTypeName() const'
../../c++/src/liborc.a(orc-proto-wrapper.cc.o):orc-proto-wrapper.cc:vtable for orc::proto::BucketStatistics: error: undefined reference to 'google::protobuf::Message::InitializationErrorString() const'
collect2: error: ld returned 1 exit status
oliverhu commented 3 years ago

I ran bazel clean --expunge before the build. My hunch is something is messed up when the orc objects were built. ORC uses protobuf 3.5 and TF uses 3.9, but why would that cause a problem in such an isolated setup?

oliverhu commented 3 years ago

Found the problem... in tf/io's .bazelrc file, there is a -D_GLIBCXX_USE_CXX11_ABI flag, that's causing the build to fail :(

oliverhu commented 3 years ago

That flag breaks ORC build unfortunately, plan to reach out to Apache ORC community for help.

yongtang commented 3 years ago

@oliverhu If the issue is related to -D_GLIBCXX_USE_CXX11_ABI then maybe we can remove it. If I remember that flags was in tensorflow earlier when Ubuntu 14.04 was still in use. Maybe this is not needed anymore.