nspcc-dev / neofs-node

NeoFS is a decentralized distributed object storage integrated with the Neo blockchain
https://fs.neo.org
GNU General Public License v3.0
31 stars 38 forks source link

GC has not collected an expired object #2858

Closed roman-khimov closed 3 weeks ago

roman-khimov commented 1 month ago

Expected Behavior

Expired objects are deleted. Even if the node was down for some time, it should delete them afterwards.

Current Behavior

May 30 14:54:27 metis2 neofs-node[656]: 2024-05-30T14:54:27.505Z        error        replicator/process.go:76        could not replicate object        {"component": "Object Replicator", "node": "03aeff8a19f0202090afb0916b1c00b432321be7e8623a06c9b9b5db8ee5c053a4", "object": "HXSaMJXk2g8C14ht8HSi7BBaiYZ1HeWh2xnWPGQCg4H6/6Ha2WhpvBsAeT23jsa9Ak8DVrqVB97xLXTSeLrjYnoGD", "error": "copy object using NeoFS API client of the remote node: status: code = 1024 message = failed to verify and store object locally: validate object format: object did not pass expiration check: object has expired"}

It tries to replicate the same set of objects again and again which means GC has failed to do its job.

Possible Solution

Fix GC.

Steps to Reproduce (for bugs)

Shut an existing node down for some time and expand it with one shard (3->4).

Your Environment

roman-khimov commented 1 month ago
May 30 17:58:59 metis4 neofs-node[4408]: 2024-05-30T17:58:59.789Z        error        replicator/process.go:76        could not replicate object        {"component": "Object Replicator", "node": "03aeff8a19f0202090afb0916b1c00b432321be7e8623a06c9b9b5db8ee5c053a4", "object": "HXSaMJXk2g8C14ht8HSi7BBaiYZ1HeWh2xnWPGQCg4H6/HmECRDC25qyX8MxNHeA21PK8aymLZcPSYQKyLBNBiTUM", "error": "copy object using NeoFS API client of the remote node: status: code = 1024 message = failed to verify and store object locally: validate object format: object did not pass expiration check: object has expired"}
carpawell commented 1 month ago

They are parts of a big (V1, hehe) object. Needed to be investigated but it seems to me that happened smth like this:

  1. V1 parts had (or had to have but did not?) expiration attr in every part
  2. There is no expiration for its parts in https://github.com/nspcc-dev/neofs-node/pull/390 (at least I do no see it), only handling small objects or GC mark for root object and "dropping" non-existing big object from the blobstor
  3. The new V2 object scheme landed and brought a nice check for replicated small objects (both for V1 and V2 big objects): https://github.com/nspcc-dev/neofs-node/blob/ea78a2da303e28f2dc83daf9e1d1fe84e9082c6d/pkg/core/object/fmt.go#L133-L162
  4. The check started to work and does not allow expired parts replication but GC still does not expire small parts.

The main solutions, for now, should include the following: see an expired big object and expire its every part, not just mark the root object as deleted.