Closed hseipp closed 1 month ago
@nadavMiz can you please take a look?
@hseipp do you have libgpfs in this deployment? I saw this event in the logs -
"code":"noobaa_gpfslib_missing","message":"Noobaa GPFS library file is missing","description":"Noobaa GPFS library file is missing","entity_type":"NODE","event_type":"STATE_CHANGE","scope":"NODE","severity":"ERROR","state":"DEGRADED","arguments":{},"pid":1440}
Also adding here an error I saw in the logs, not sure it's causing the bug -
core.sdk.object_sdk:: Got an error while trying to omitBy param.xattr: { [Symbol(XATTR_SORT_SYMBOL)]: true } error: TypeError: name.startsWith is not a function at /usr/local/noobaa-core/src/sdk/object_sdk.js:645:89 at /usr/local/noobaa-core/node_modules/lodash/lodash.js:10660:37 at /usr/local/noobaa-core/node_modules/lodash/lodash.js:13658:16 at basePickBy (/usr/local/noobaa-core/node_modules/lodash/lodash.js:3825:13) at pickBy (/usr/local/noobaa-core/node_modules/lodash/lodash.js:13657:14) at Function.omitBy (/usr/local/noobaa-core/node_modules/lodash/lodash.js:13607:14) at NsfsObjectSDK.fix_copy_source_params (/usr/local/noobaa-core/src/sdk/object_sdk.js:645:48) at async op_func (/usr/local/noobaa-core/src/sdk/object_sdk.js:730:41) at async NsfsObjectSDK._call_op_and_update_stats (/usr/local/noobaa-core/src/sdk/object_sdk.js:543:27) at async Object.put_object [as handler] (/usr/local/noobaa-core/src/endpoint/s3/ops/s3_put_object.js:39:19)
Aug 28 15:34:47 m1.example.com node[9119]: Aug-28 15:34:47.626 [nsfs/9119] [L3] core.sdk.object_sdk:: Got an error while trying to omitBy param.xattr: { [Symbol(XATTR_SORT_SYMBOL)]: true } error: TypeError: name.startsWith is not a function
Aug 28 15:34:47 m1.example.com node[9119]: at /usr/local/noobaa-core/src/sdk/object_sdk.js:645:89
Aug 28 15:34:47 m1.example.com node[9119]: at /usr/local/noobaa-core/node_modules/lodash/lodash.js:10660:37
Aug 28 15:34:47 m1.example.com node[9119]: at /usr/local/noobaa-core/node_modules/lodash/lodash.js:13658:16
Aug 28 15:34:47 m1.example.com node[9119]: at basePickBy (/usr/local/noobaa-core/node_modules/lodash/lodash.js:3825:13)
Aug 28 15:34:47 m1.example.com node[9119]: at pickBy (/usr/local/noobaa-core/node_modules/lodash/lodash.js:13657:14)
Aug 28 15:34:47 m1.example.com node[9119]: at Function.omitBy (/usr/local/noobaa-core/node_modules/lodash/lodash.js:13607:14)
Aug 28 15:34:47 m1.example.com node[9119]: at NsfsObjectSDK.fix_copy_source_params (/usr/local/noobaa-core/src/sdk/object_sdk.js:645:48)
Aug 28 15:34:47 m1.example.com node[9119]: at async op_func (/usr/local/noobaa-core/src/sdk/object_sdk.js:730:41)
Aug 28 15:34:47 m1.example.com node[9119]: at async NsfsObjectSDK._call_op_and_update_stats (/usr/local/noobaa-core/src/sdk/object_sdk.js:543:27)
Aug 28 15:34:47 m1.example.com node[9119]: at async Object.put_object [as handler] (/usr/local/noobaa-core/src/endpoint/s3/ops/s3_put_object.js:39:19)
Aug 28 15:34:47 m1.example.com node[9119]: Aug-28 15:34:47.626 [nsfs/9119] [L2] core.sdk.namespace_fs:: NamespaceFS: is_server_side_copy: true
Interesting. I am using a Storage Scale DME 5.2.1 GA installation.
$ sudo find / -name "libgpfs.so"
/usr/lib/libgpfs.so
/usr/lib64/libgpfs.so
/usr/lpp/mmfs/lib/i386/libgpfs.so
/usr/lpp/mmfs/lib/libgpfs.so
we have an error in the code while trying to read the :
Aug 28 15:35:06 m1.example.com [9119]: [nsfs/9119] [L0] core.sdk.namespace_fs:: NamespaceFS: read_object_stream { file_path: '/ibm/fs1/examples/buckets/s3tests-j2s4w9wx4zqvvi9wevnqs-1/dstmultipart', st
art: 0, end: Infinity, size: 31457280 }
Aug 28 15:35:06 m1.example.com [9119]: [nsfs/9119] [L1] core.util.buffer_utils:: BufferPool.get_buffer: sem value 1342177280 waiting_value 0 buffers length 2
Aug 28 15:35:06 m1.example.com [9119]: [nsfs/9119] [L1] core.util.buffer_utils:: BufferPool.get_buffer: sem value 1333788672 waiting_value 0 buffers length 1
Aug 28 15:35:06 m1.example.com [9119]: [nsfs/9119] [L0] core.sdk.object_sdk:: request aborted /s3tests-j2s4w9wx4zqvvi9wevnqs-1/srcmultipart
Aug 28 15:35:06 m1.example.com node[9119]: Aug-28 15:35:06.573 [nsfs/9119] [L0] core.sdk.object_sdk:: request aborted /s3tests-j2s4w9wx4zqvvi9wevnqs-1/srcmultipart
Aug 28 15:35:06 m1.example.com node[9119]: Aug-28 15:35:06.574 [nsfs/9119] [L0] core.sdk.object_sdk:: request error: Error: aborted
Aug 28 15:35:06 m1.example.com node[9119]: at connResetException (node:internal/errors:787:14)
Aug 28 15:35:06 m1.example.com node[9119]: at abortIncoming (node:_http_server:793:17)
Aug 28 15:35:06 m1.example.com node[9119]: at socketOnClose (node:_http_server:787:3)
Aug 28 15:35:06 m1.example.com node[9119]: at TLSSocket.emit (node:events:530:35)
Aug 28 15:35:06 m1.example.com node[9119]: at TLSSocket.emit (node:domain:488:12)
Aug 28 15:35:06 m1.example.com node[9119]: at node:net:337:12
Aug 28 15:35:06 m1.example.com node[9119]: at Socket.done (node:_tls_wrap:657:7)
Aug 28 15:35:06 m1.example.com node[9119]: at Object.onceWrapper (node:events:633:26)
Aug 28 15:35:06 m1.example.com node[9119]: at Socket.emit (node:events:530:35)
Aug 28 15:35:06 m1.example.com node[9119]: at Socket.emit (node:domain:488:12) {
Aug 28 15:35:06 m1.example.com node[9119]: code: 'ECONNRESET'
Aug 28 15:35:06 m1.example.com node[9119]: } /s3tests-j2s4w9wx4zqvvi9wevnqs-1/srcmultipart
Aug 28 15:35:06 m1.example.com [9119]: [nsfs/9119] [L0] core.sdk.object_sdk:: request error: Error: aborted at connResetException (node:internal/errors:787:14) at abortIncoming (node:_http_server
:793:17) at socketOnClose (node:_http_server:787:3) at TLSSocket.emit (node:events:530:35) at TLSSocket.emit (node:domain:488:12) at node:net:337:12 at Socket.done (node:_tls_wrap:657:7)
at Object.onceWrapper (node:events:633:26) at Socket.emit (node:events:530:35) at Socket.emit (node:domain:488:12) { code: 'ECONNRESET'} /s3tests-j2s4w9wx4zqvvi9wevnqs-1/srcmultipart
Aug 28 15:35:06 m1.example.com node[9119]: Aug-28 15:35:06.574 [nsfs/9119] [L0] core.sdk.namespace_fs:: NamespaceFS: read_object_stream error file /ibm/fs1/examples/buckets/s3tests-j2s4w9wx4zqvvi9wevnqs-1/srcmultipart AbortError: The operation was aborted
Aug 28 15:35:06 m1.example.com node[9119]: at EventTarget.abortListener (node:events:1006:14)
Aug 28 15:35:06 m1.example.com node[9119]: at [nodejs.internal.kHybridDispatch] (node:internal/event_target:826:20)
Aug 28 15:35:06 m1.example.com node[9119]: at EventTarget.dispatchEvent (node:internal/event_target:761:26)
Aug 28 15:35:06 m1.example.com node[9119]: at abortSignal (node:internal/abort_controller:371:10)
Aug 28 15:35:06 m1.example.com node[9119]: at AbortController.abort (node:internal/abort_controller:393:5)
Aug 28 15:35:06 m1.example.com node[9119]: at IncomingMessage.<anonymous> (/usr/local/noobaa-core/src/sdk/object_sdk.js:135:35)
Aug 28 15:35:06 m1.example.com node[9119]: at Object.onceWrapper (node:events:632:28)
Aug 28 15:35:06 m1.example.com node[9119]: ... 2 lines matching cause stack trace ...
Aug 28 15:35:06 m1.example.com node[9119]: at IncomingMessage._destroy (node:_http_incoming:224:10) {
Aug 28 15:35:06 m1.example.com node[9119]: code: 'ABORT_ERR',
Aug 28 15:35:06 m1.example.com node[9119]: [cause]: Error: request aborted /s3tests-j2s4w9wx4zqvvi9wevnqs-1/srcmultipart
Aug 28 15:35:06 m1.example.com node[9119]: at IncomingMessage.<anonymous> (/usr/local/noobaa-core/src/sdk/object_sdk.js:135:41)
Aug 28 15:35:06 m1.example.com node[9119]: at Object.onceWrapper (node:events:632:28)
Aug 28 15:35:06 m1.example.com node[9119]: at IncomingMessage.emit (node:events:518:28)
Aug 28 15:35:06 m1.example.com node[9119]: at IncomingMessage.emit (node:domain:488:12)
Aug 28 15:35:06 m1.example.com node[9119]: at IncomingMessage._destroy (node:_http_incoming:224:10)
Aug 28 15:35:06 m1.example.com node[9119]: at _destroy (node:internal/streams/destroy:121:10)
Aug 28 15:35:06 m1.example.com node[9119]: at IncomingMessage.destroy (node:internal/streams/destroy:83:5)
Aug 28 15:35:06 m1.example.com node[9119]: at abortIncoming (node:_http_server:793:9)
Aug 28 15:35:06 m1.example.com node[9119]: at socketOnClose (node:_http_server:787:3)
Aug 28 15:35:06 m1.example.com node[9119]: at TLSSocket.emit (node:events:530:35)
Aug 28 15:35:06 m1.example.com node[9119]: }
there are few things I observed when replicating in my environment:
@romayalon Regarding your observation on libgpfs
not being loaded https://github.com/noobaa/noobaa-core/issues/8315#issuecomment-2325896368 it looks like an issue on my particular VM upon bootup.
There seems to be a race condition where noobaa is trying to access a file on the cesShared Scale Filesystem before it is mounted. mmces service stop s3 && mmces service start s3
resolves the situation.
Note that I can also produce the issue with Ceph s3-tests test_versioning_obj_create_overwrite_multipart()
. This time I double-checked that libgpfs.so
got properly loaded.
Environment info
Actual behavior
Content of the copied object whose source was uploaded to a version-enabled bucket using multi-part upload is not identical to the source object:
Expected behavior
Content of the copied object should be identical to the source object, assertion should pass.
Steps to reproduce
Run Ceph s3-tests
test_object_copy_versioning_multipart_upload()
:More information - Screenshots / Logs / Other output
See "all" logging level noobaa log attached - look at log entries at 15:34:00 and later:
noobaa_20240828_1748.log.gz