Closed votdev closed 1 year ago
Restoring (undeleting an object) should not be done by copying the object version, that's why I would change the subject of this issue, as I find it confusing.
The issue is that we should not allow copying delete markers ( or, at least don't crash miserably ).
For restoring a deleted object the UI should just delete the delete marker.
After analysing this one and debugging I think it is a duplicate of https://github.com/aquarist-labs/s3gw/issues/183 but in a different scenario. In this particular case code follows this:
int RGWCopyObj::verify_permission(optional_yield y)
...
/* check source object permissions */
op_ret = read_obj_policy(
this, driver, s, src_bucket->get_info(), src_bucket->get_attrs(),
&src_acl, &src_placement.storage_class, src_policy, src_bucket.get(),
s->src_object.get(), y
);
...
int ret = get_obj_policy_from_attr(
dpp, s->cct, driver, bucket_info, bucket_attrs, acl, storage_class,
object, s->yield
);
...
if (storage_class) {
bufferlist scbl;
int r = rop->get_attr(dpp, RGW_ATTR_STORAGE_CLASS, scbl, y);
if (r >= 0) {
*storage_class = scbl.to_str();
} else {
storage_class->clear();
}
}
rop->get_attr
is the one returning the -ENOENT
error.
int SFSObject::SFSReadOp::get_attr(
const DoutPrefixProvider* /*dpp*/, const char* name, bufferlist& dest,
optional_yield /*y*/
) {
if (!objref || objref->deleted) { // HERE objref->deleted is true because we're trying to restore a delete marker
return -ENOENT;
}
if (!objref->get_attr(name, dest)) {
return -ENODATA;
}
return 0;
}
So... RGWCopyObj::verify_permission returns -ENOENT
but the user is admin, so it overwrites this and tries to continue.
There is a data structure that is not initialised (dest_bucket
) and it cores at this point in RGWCopyObj::execute
:
if (!version_id.empty()) {
dest_object->set_instance(version_id);
} else if (dest_bucket->versioning_enabled()) { /// dest_bucket is 0x00 in here because it was not initialised.
dest_object->gen_rand_obj_instance_name();
}
I've also tested with a rebased version from upstream and, while it's still not working fine, it's not coring.
radosgw
is catching the exception and the error is also different:
2023-09-20T11:09:05.507+0200 7f217b37f6c0 10 req 0 0.003333386s s3:copy_obj > object::copy_object source(bucket: test2, obj: test-object), dest(bucket: test2, obj: copy2)
2023-09-20T11:09:05.507+0200 7f217b37f6c0 10 req 0 0.003333386s s3:copy_obj > object::copy_object copying file from '"/home/xavi/sfs/e8/44/98bd-dae9-4c1a-ac41-936c4014f6f7/2.v"' to '"/home/xavi/sfs/31/dc/d9ab-121b-453b-85d1-ffb5a34587f2/5.v"'
2023-09-20T11:09:05.507+0200 7f217b37f6c0 0 req 0 0.003333386s s3:copy_obj !!! BUG Unhandled exception while executing operation copy_obj: filesystem error: cannot copy file: No such file or directory [/home/xavi/sfs/e8/44/98bd-dae9-4c1a-ac41-936c4014f6f7/2.v] [/home/xavi/sfs/31/dc/d9ab-121b-453b-85d1-ffb5a34587f2/5.v]. replying internal error
2023-09-20T11:09:05.510+0200 7f217b37f6c0 0 req 0 0.006666772s s3:copy_obj START BACKTRACE (exception NSt10filesystem7__cxx1116filesystem_errorE)
ceph version 18.0.0-6476-g2f59f09c996 (2f59f09c996e1b67732e8e0b1a1fe5a61504b33c) reef (dev)
1: (rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, req_state*, optional_yield, rgw::sal::Driver*, bool)+0x1139) [0x55eb13ccdef0]
2: (process_request(RGWProcessEnv const&, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, RGWRestfulIO*, optional_yield, rgw::dmclock::Scheduler*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*, int*)+0x14dd) [0x55eb13ccf9e6]
3: bin/radosgw(+0x2117fa0) [0x55eb13c27fa0]
4: bin/radosgw(+0x2118af0) [0x55eb13c28af0]
5: bin/radosgw(+0x2118ca9) [0x55eb13c28ca9]
6: bin/radosgw(+0x2118e1a) [0x55eb13c28e1a]
7: bin/radosgw(+0x2118ee9) [0x55eb13c28ee9]
8: make_fcontext()
I would keep this bug open and re-test when we do the final rebase. I this this bug and https://github.com/aquarist-labs/s3gw/issues/183 will still need extra work after rebasing.
Moving it to "On hold", waiting for the rebase.
Closing this as it does not crash after 18.2.0
rebase
Tested copying a deleted object with aws cli
utility.
This issue applies to the latest code in s3gw-ceph.
Deleted
batchNote, i'm aware that the UI should prevent that, but the backend should handle that correctly as well.