s3gw-tech / s3gw

Container able to run on a Kubernetes cluster, providing S3-compatible endpoints to applications.
https://s3gw.tech
Apache License 2.0
148 stars 21 forks source link

Assertion hit in list multiparts: ceph_assert(bucket_entries.size() == 1) #826

Closed irq0 closed 1 year ago

irq0 commented 1 year ago
2023-11-21T16:04:18.539865093+01:00 stdout F     -1> 2023-11-21T15:04:18.511+0000 7fc711e85700 -1 ../src/rgw/driver/sfs/sqlite/sqlite_multipart.cc: In function 'std::optional<std::vector<rgw::sal::sfs::sqlite::DB
Multipart> > rgw::sal::sfs::sqlite::SQLiteMultipart::list_multiparts(const std::string&, const std::string&, const std::string&, const std::string&, const int&, bool*) const' thread 7fc711e85700 time 2023-11-21T1
5:04:18.506775+0000
2023-11-21T16:04:18.539868643+01:00 stdout F ../src/rgw/driver/sfs/sqlite/sqlite_multipart.cc: 46: FAILED ceph_assert(bucket_entries.size() == 1)
2023-11-21T16:04:18.539870313+01:00 stdout F
2023-11-21T16:04:18.539872273+01:00 stdout F  ceph version Development (no_version) reef (stable)
2023-11-21T16:04:18.539874093+01:00 stdout F  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x132) [0x7fc7d44e518b]
2023-11-21T16:04:18.539875982+01:00 stdout F  2: /s3gw/lib/libceph-common.so.2(+0x250353) [0x7fc7d44e5353]
2023-11-21T16:04:18.539885152+01:00 stdout F  3: (rgw::sal::sfs::sqlite::SQLiteMultipart::list_multiparts(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basi
c_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char
>, std::allocator<char> > const&, int const&, bool*) const+0x27f) [0x55fa480692cf]
2023-11-21T16:04:18.539887342+01:00 stdout F  4: (rgw::sal::sfs::SFSMultipartUploadV2::list_multiparts(DoutPrefixProvider const*, rgw::sal::SFStore*, rgw::sal::SFSBucket*, std::shared_ptr<rgw::sal::sfs::Bucket>,
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::__cxx11::basic_string<char, std::char
_traits<char>, std::allocator<char> > const&, int const&, std::vector<std::unique_ptr<rgw::sal::MultipartUpload, std::default_delete<rgw::sal::MultipartUpload> >, std::allocator<std::unique_ptr<rgw::sal::Multipar
tUpload, std::default_delete<rgw::sal::MultipartUpload> > > >&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, std::less<std::__cxx11::basic_string<char, std::char
_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, bool> > >*, bool*)+0x3b9) [0x55fa480cf359]
2023-11-21T16:04:18.539889912+01:00 stdout F  5: (rgw::sal::SFSBucket::list_multiparts(DoutPrefixProvider const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx
11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::
allocator<char> > const&, int const&, std::vector<std::unique_ptr<rgw::sal::MultipartUpload, std::default_delete<rgw::sal::MultipartUpload> >, std::allocator<std::unique_ptr<rgw::sal::MultipartUpload, std::default_delete<rgw::sal::MultipartUpload> > > >&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::
allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, bool> > >*, bool*)+0x30b) [0x55fa480bcf7b]
2023-11-21T16:04:18.539891862+01:00 stdout F  6: (RGWListBucketMultiparts::execute(optional_yield)+0x8b) [0x55fa47796f9b]
2023-11-21T16:04:18.539893642+01:00 stdout F  7: (rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, req_state*, optional_yield, rgw::sal::Driver*, bool)+0xc86) [0x55fa4757cfc6]
2023-11-21T16:04:18.539895552+01:00 stdout F  8: (process_request(RGWProcessEnv const&, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, RGWRestfulIO*, optional_yield, rgw::dmclock::Scheduler*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*, int*)+0x2192) [0x55fa4757fb42]
2023-11-21T16:04:18.539897331+01:00 stdout F  9: radosgw(+0x10a0ac5) [0x55fa474b7ac5]
2023-11-21T16:04:18.539899071+01:00 stdout F  10: radosgw(+0x10a1894) [0x55fa474b8894]
2023-11-21T16:04:18.539902581+01:00 stdout F  11: make_fcontext()
2023-11-21T16:04:18.539904281+01:00 stdout F
2023-11-21T16:04:18.539906121+01:00 stdout F      0> 2023-11-21T15:04:18.511+0000 7fc711e85700 -1 *** Caught signal (Aborted) **
2023-11-21T16:04:18.539907891+01:00 stdout F  in thread 7fc711e85700 thread_name:radosgw
jecluis commented 1 year ago

@irq0 does this happen every time ?

irq0 commented 1 year ago

Nope, I've seen it once in the logs. Unfortunately none of the tests I did when I found it could reproduce it. I hoped someone can backtrack from the assertion to see what went unexpected

tserong commented 1 year ago

This might be a case where we need another fix like https://github.com/aquarist-labs/ceph/pull/235

Here's the code:

std::optional<std::vector<DBMultipart>> SQLiteMultipart::list_multiparts(
    const std::string& bucket_name, const std::string& prefix,
    const std::string& marker, const std::string& delim, const int& max_uploads,
    bool* is_truncated
) const {
  auto storage = conn->get_storage();

  auto bucket_entries = storage->get_all<DBBucket>(
      where(is_equal(&DBBucket::bucket_name, bucket_name))
  );
  if (bucket_entries.size() == 0) {
    return std::nullopt;
  }
  ceph_assert(bucket_entries.size() == 1);

So that'll pick up all buckets with that name, regardless of deletion state.

(I haven't confirmed by testing, it just seems likely to me).