oap-project / velox

A new C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
https://facebookincubator.github.io/velox/
Apache License 2.0
21 stars 47 forks source link

branch-1.1:Failed to get metadata for S3 object #461

Open xingnailu opened 10 months ago

xingnailu commented 10 months ago

Bug description

Bug description I built gluten+velox using branch-1.1, submitted a tpch query using spark-shell, and the data was stored in s3. However, the following error occurred during execution:

Reason: Failed to get metadata for S3 object due to: 'Unknown error'. Path:'s3://xxxxxxx/user/hive/warehouse/tpch_orc.db/customer/part-00027-31ef1f3c-5b27-4f6c-aef4-7f77f7749873-c000.snappy.orc', SDK Error Type:100, HTTP Status Code:400, S3 Service:'AmazonS3', Message:'No response body.', RequestID:'KC5WQZ78QWKQ9BFX'"

But I can use gluten tag v1.0.0 version to execute normally.

@majetideepak

System information

System information build branch-1.1 system info:

Velox System Info v0.0.2 Commit: https://github.com/facebookincubator/velox/commit/bbd65c4109fc11d4021334aff817ff384eab7b88 CMake Version: 3.16.3 System: Linux-5.15.0-91-generic Arch: x86_64 C++ Compiler: /bin/c++ C++ Compiler Version: 9.4.0 C Compiler: /bin/cc C Compiler Version: 9.4.0 CMake Prefix Path: /usr/local;/usr;/;/usr;/usr/local;/usr/X11R6;/usr/pkg;/opt

run on aws eks

Relevant logs

"2023-12-05T07:12:37.689576121Z stdout F 23/12/05 07:12:37 ERROR TaskResources: Task 8 failed by error: ",
"2023-12-05T07:12:37.689606328Z stdout F io.glutenproject.exception.GlutenException: java.lang.RuntimeException: Exception: VeloxRuntimeError",
"2023-12-05T07:12:37.689628682Z stdout F Error Source: RUNTIME",
"2023-12-05T07:12:37.689632451Z stdout F Error Code: INVALID_STATE",
"2023-12-05T07:12:37.689636372Z stdout F Reason: Failed to get metadata for S3 object due to: 'Unknown error'. Path:'s3://xxxxxx/user/hive/warehouse/tpch_orc.db/customer/part-00027-31ef1f3c-5b27-4f6c-aef4-7f77f7749873-c000.snappy.orc', SDK Error Type:100, HTTP Status Code:400, S3 Service:'AmazonS3', Message:'No response body.', RequestID:'KC5WQZ78QWKQ9BFH'",
"2023-12-05T07:12:37.689639435Z stdout F Retriable: False",
"2023-12-05T07:12:37.689643198Z stdout F Context: Split [Hive: s3a://xxxxxx/user/hive/warehouse/tpch_orc.db/customer/part-00027-31ef1f3c-5b27-4f6c-aef4-7f77f7749873-c000.snappy.orc 0 - 121746056] Task Gluten_Stage_0_TID_8",
"2023-12-05T07:12:37.689646437Z stdout F Top-Level Context: Same as context.",
"2023-12-05T07:12:37.689649292Z stdout F Function: initialize",
"2023-12-05T07:12:37.689652406Z stdout F File: ../../velox/connectors/hive/storage_adapters/s3fs/S3FileSystem.cpp",
"2023-12-05T07:12:37.689655045Z stdout F Line: 93", 
"2023-12-05T07:12:37.689657984Z stdout F Stack trace:",
"2023-12-05T07:12:37.689661375Z stdout F # 0  facebook::velox::VeloxException::VeloxException(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)",
"2023-12-05T07:12:37.689670744Z stdout F # 1  void facebook::velox::detail::veloxCheckFail<facebook::velox::VeloxRuntimeError, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>(facebook::velox::detail::VeloxCheckFailArgs const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)", 
"2023-12-05T07:12:37.68967352Z stdout F # 2  facebook::velox::(anonymous namespace)::S3ReadFile::initialize()",
"2023-12-05T07:12:37.689677103Z stdout F # 3  facebook::velox::filesystems::S3FileSystem::openFileForRead(std::basic_string_view<char, std::char_traits<char> >, facebook::velox::filesystems::FileOptions const&)",
"2023-12-05T07:12:37.689680232Z stdout F # 4  facebook::velox::FileHandleGenerator::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)",
"2023-12-05T07:12:37.689682935Z stdout F # 5  facebook::velox::CachedFactory<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::shared_ptr<facebook::velox::FileHandle>, facebook::velox::FileHandleGenerator>::generate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)", 
"2023-12-05T07:12:37.689686275Z stdout F # 6  facebook::velox::connector::hive::HiveDataSource::addSplit(std::shared_ptr<facebook::velox::connector::ConnectorSplit>)",
"2023-12-05T07:12:37.68970488Z stdout F # 7  facebook::velox::exec::TableScan::getOutput()",
"2023-12-05T07:12:37.689707926Z stdout F # 8  facebook::velox::exec::Driver::runInternal(std::shared_ptr<facebook::velox::exec::Driver>&, std::shared_ptr<facebook::velox::exec::BlockingState>&, std::shared_ptr<facebook::velox::RowVector>&)",
"2023-12-05T07:12:37.689710953Z stdout F # 9  facebook::velox::exec::Driver::next(std::shared_ptr<facebook::velox::exec::BlockingState>&)",
"2023-12-05T07:12:37.689713812Z stdout F # 10 facebook::velox::exec::Task::next(folly::SemiFuture<folly::Unit>*)",
"2023-12-05T07:12:37.689716972Z stdout F # 11 gluten::WholeStageResultIterator::next()",
"2023-12-05T07:12:37.689719966Z stdout F # 12 Java_io_glutenproject_vectorized_ColumnarBatchOutIterator_nativeHasNext",
dcoliversun commented 10 months ago

I have similar exception but data is stored on Alibaba OSS. S3 Storage Adapters support oss scheme[1]

Exception info is

Caused by: io.glutenproject.exception.GlutenException: java.lang.RuntimeException: Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Failed to get metadata for S3 object due to: 'Resource not found'. Path:'s3://henghzhen-test-hangzhou/db/t1/b=1/c=10/part-00000-d4940ed1-7f70-44f5-bbb0-65ae29f325f1.c000.snappy.parquet', SDK Error Type:16, HTTP Status Code:404, S3 Service:'AmazonS3', Message:'No response body.', RequestID:'2VQQRSWNX8QQGNNY'
Retriable: False
Context: Split [Hive: s3a://henghzhen-test-hangzhou/db/t1/b=1/c=10/part-00000-d4940ed1-7f70-44f5-bbb0-65ae29f325f1.c000.snappy.parquet 0 - 443] Task Gluten_Stage_0_TID_0
Top-Level Context: Same as context.
Function: initialize
File: ../../velox/connectors/hive/storage_adapters/s3fs/S3FileSystem.cpp
Line: 93
Stack trace:
# 0  _ZN8facebook5velox7process10StackTraceC1Ei
# 1  _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
# 2  _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRKNS1_18VeloxCheckFailArgsET0_
# 3  _ZN8facebook5velox12_GLOBAL__N_110S3ReadFile10initializeEv
# 4  _ZN8facebook5velox11filesystems12S3FileSystem15openFileForReadESt17basic_string_viewIcSt11char_traitsIcEERKNS1_11FileOptionsE
# 5  _ZN8facebook5velox19FileHandleGeneratorclERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
# 6  _ZN8facebook5velox13CachedFactoryINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt10shared_ptrINS0_10FileHandleEENS0_19FileHandleGeneratorEE8generateERKS7_
# 7  _ZN8facebook5velox9connector4hive14HiveDataSource8addSplitESt10shared_ptrINS1_14ConnectorSplitEE
# 8  _ZN8facebook5velox4exec9TableScan9getOutputEv
# 9  _ZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEE
# 10 _ZN8facebook5velox4exec6Driver4nextERSt10shared_ptrINS1_13BlockingStateEE
# 11 _ZN8facebook5velox4exec4Task4nextEPN5folly10SemiFutureINS3_4UnitEEE
# 12 _ZN6gluten24WholeStageResultIterator4nextEv
# 13 Java_io_glutenproject_vectorized_ColumnarBatchOutIterator_nativeHasNext
# 14 0x00007f8c75018427

[1] https://facebookincubator.github.io/velox/develop/connectors.html?highlight=oss