noobaa / noobaa-core

High-performance S3 application gateway to any backend - file / s3-compatible / multi-clouds / caching / replication ...
https://www.noobaa.io
Apache License 2.0
268 stars 78 forks source link

Error getting object_mapping after switching off node #3861

Closed YuliaKovalenko closed 6 years ago

YuliaKovalenko commented 6 years ago

Environment info

Actual behavior

  1. [ERROR] core.rpc.rpc:: RPC._request: response ERROR srv object_api.read_object_mappings params { bucket: ['first.bucket'], key: 'file_part_22561510176430', adminfo: true } Error: INVALID_SCHEMA_REPLY SERVER node_api#/methods/list_nodes
  2. THEN core.rpc.rpc:: RPC._request: response ERROR srv system_api.read_system Error: INVALID_SCHEMA_REPLY SERVER system_api#/methods/read_system CAUGHT console error rebuildfiles

Expected behavior

  1. files rebuild (object_mapping) successfully with 3 online replicas and all available chunks

Steps to reproduce

  1. Create large amount of agents (default 5)
  2. Create a dataset on it (1 GB per agent)
  3. Read and verify the read
  4. Power down agents (random number between 1 to the max amount)
  5. Read and verify the read [https://trello.com/c/fIVWiYGN/518-changes-to-the-ec-replica-test](via task)

    Screenshots or Logs or other output that would be helpful

    (If large, please upload as attachment)

guymguym commented 6 years ago

@YuliaKovalenko /CC @liranmauda Please locate the complete INVALID_SCHEMA_REPLY error in the server log and copy to the bug. Thanks

YuliaKovalenko commented 6 years ago

@guymguym , I'm afraid that I deleted this tests server already. this error permanent reproduced, so soon I'll added logs

guymguym commented 6 years ago

Thanks, I'm waiting for the logs and will resolve quickly once I can see where it comes from.

YuliaKovalenko commented 6 years ago

@guymguym [ERROR] core.rpc.rpc_schema:: INVALID_ SCHEMA_REPLY SERVER node_api#/methods/list_nodes ERRORS: [ { keyword: 'idate', dataPath: '.nodes[2].data_activity.time.end', schemaPath: '#/definitions/ti me_progress/properties/end/idate', params: { keyword: 'idate' }, message: 'should pass "idate" keyword validation', schema: true, parentSchema: { id ate: true }, data: 1512058633174.4548 } ] REPLY: { total_count: 5, filter_co unts: { count: 5, online: 4, by_mode: { OPTIMAL: 4, OFFLINE: 1 } }, nodes: [ { has_issues: false, online: true, readable: true, writable: true, trusted: true, mode: 'OPTIMAL', connectivity: 'TCP', storage_full: false, name: 'win2008-C-f7903b29', geolocation: 'Sa o Paulo', ip: '10.1.7.5', host_id: 'f7903b29-21fa-41dd-84f5-f6264db0 f83f', node_type: 'BLOCK_STORE_FS', rpc_address: 'n2n://5a096b0d824c 1f1b04e404cf', base_address: 'wss://10.1.7.4:8443', version: '2.0.0- 065f373', latency_to_server: [ 711, 629, 651, 651, 707, 624, 636, 685, 688 , 677, 652, 712, 659, 678, 653, 639, 667, 677, 658, 651, [length]: 20 ], latency_of_disk_read: [ 0.5430999998934567, 0.2415990000590682, 0.33329900028184056, 0.38810000009 834766, 0.2581990002654493, 1.3884000000543892, 0.279 3999998830259, 0.29039999982342124, 0.43320100009441376, 0.36670099990442395, 0.41999900015071034, 0.42139999987557 53, 0.2457000003196299, 0.2434989996254444, 0.2457989 9990931153, 0.2674010000191629, 0.47960099996998906, 0.3447990003041923, 0.4174000001512468, 0.3956989999860525, [length]: 20 ], latency_of_disk_write: [ 1.0493999999016523 , 0.6385999997146428, 0.5274000000208616, 0.694999000 0575781, 0.4566999999806285, 0.5215000002644956, 0.58 77999998629093, 0.4930989998392761, 0.49420000007376075, 0.46559999976307154, 0.6343990000896156, 0.505499000195413 8, 3.5682010003365576, 0.6393989999778569, 0.53940000 01251698, 0.524900000076741, 6.400600000284612, 0.728 7010001018643, 0.4557999996468425, 0.46760099986568093, [length]: 20 ], debug_level: 0, heartbeat: 1510570229671, p eer_id: '5a096b0d824c1f1b04e404cf', _id: '5a096b0d824c1f1b04e404ce', pool: 'first.pool', storage: { total: 150321754112, free : 121495576576, used: 3684868121, alloc: 0, limit: 0, reserved: 10737418240, used_other: 14403891175, unav ailable_free: 0 }, drive: { mount: 'C:', drive_id: 'C:' }, os_info: { hostname: 'win2008', ostype: 'Windows_NT', platform: 'win32', arch: 'x64', release: '6.1.7601', uptime: 1 510566342210, loadavg: [ 0, 0, 0, [length]: 3 ], totalmem: 429 4500352, freemem: 3073314816, cpus: [ { model: 'Int el(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz', speed: 2458, times: { user: 475109, nice: 0, sys: 122750, idle: 3289515, irq: 3625 } }, { model: 'Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz', spe ed: 2426, times: { user: 510953, nice: 0, sys: 90921, idle: 328540 6, irq: 328 } }, [length]: 2 ], networkInterfaces: { 'Local Area Connection': [ { address: 'fe80::f0a3:c7bc:6b26:e 1b0', netmask: 'ffff:ffff:ffff:ffff::', family : 'IPv6', mac: '00:0d:3a:f9:8e:d1', scopeid: 1 1, internal: false }, { address: '10.1.7.5', netmask: '255.255.255.0', family: 'IPv4', mac: '00:0d:3a:f9:8e:d1', internal: false }, [length]: 2 ], 'Loopback Pseudo-Interface 1': [ { address: '::1', netmask: 'ffff:ffff:ffff:ffff:ffff:ffff:ffff :ffff', family: 'IPv6', mac: '00:00:00:00:00:0 0', scopeid: 0, internal: true }, { address: '127.0.0.1', netmask: '255.0.0.0', family: 'IPv4', mac: '00:00:00:00:00:00', i nternal: true }, [length]: 2 ] } }, host_seq: '6', un trusted_reasons: {} }, { has_issues: false, online: true, readab le: true, writable: true, trusted: true, mode: 'OPTIMAL', connectivity: 'TCP', storage_full: false, name: 'redhat7-94162d93' , geolocation: 'Tokyo', ip: '10.1.7.6', host_id: '94162d93-8fe e-434f-a764-e84c0a7648f6', node_type: 'BLOCK_STORE_FS', rpc_address: 'n2n://5a0969c5824c1f1b04e404c3', base_address: 'wss://10.1.7.4:8443', version: '2.0.0-065f373', latency_to_server: [ 192, 2 83, 250, 226, 214, 237, 242, 249, 226, 248, 229, 227, 247, 205, 263, 235, 212, 244, 277 , 250, [length]: 20 ], latency_of_disk_read: [ 0 .3333040000870824, 0.33990299981087446, 0.4093039999715984, 0.4174049999564886, 0.34710400039330125, 0.353602999821 3053, 0.6473070001229644, 0.45300500001758337, 0.3118 0400028824806, 0.5062049999833107, 0.43250499991700053,

YuliaKovalenko commented 6 years ago

[ERROR] core.rpc.rpc:: RPC._request: r esponse ERROR srv node_api.list_nodes params { query: { nodes: [ { id: '5a0969c5824c1f1b04e404c2' }, { id: '5a096a06824c1f1b04e404c8' }, { id: '5a0969a6824c1f1b04e404be' }, { id: '5a09698b824c1f1b04e404ba' }, { id: '5a096b0d824c1f1b04e404ce' }, [length]: 5 ] }, fields: [ 'name', 'pool', 'ip', 'host_id', 'heartbeat', 'rpc_address', 'is_cloud_node', 'node_type', 'is_mongo_node', 'online', 'r eadable', 'writable', 'storage_full', 'latency_of_disk_read', [l ength]: 14 ] } reqid 62181@wss://127.0.0.1:8443(7wu45anc) took [13.9+12.3=26.1] Error: INVALID_SCHEMA_REPLY SERVER node_api#/methods/list_nodes at RpcError ( /root/node_modules/noobaa-core/src/rpc/rpc_error.js:11:9) at RpcRequest._set_ response (/root/node_modules/noobaa-core/src/rpc/rpc_request.js:159:26) at RP C._on_response (/root/node_modules/noobaa-core/src/rpc/rpc.js:393:28) at RPC. _on_message (/root/node_modules/noobaa-core/src/rpc/rpc.js:704:18) at RpcWsCo nnection.conn.on.msg (/root/node_modules/noobaa-core/src/rpc/rpc.js:545:36) a t emitOne (events.js:96:13) at RpcWsConnection.emit (events.js:188:7) at W ebSocket.ws.on (/root/node_modules/noobaa-core/src/rpc/rpc_ws.js:45:53) at em itOne (events.js:96:13) at WebSocket.emit (events.js:188:7) at Receiver._r eceiver.onmessage (/root/node_modules/noobaa-core/node_modules/ws/lib/WebSocket. js:146:47) at Receiver.dataMessage (/root/node_modules/noobaa-core/node_modul es/ws/lib/Receiver.js:380:14) at Receiver.getData (/root/node_modules/noobaa- core/node_modules/ws/lib/Receiver.js:330:12) at Receiver.startLoop (/root/nod e_modules/noobaa-core/node_modules/ws/lib/Receiver.js:165:16) at Receiver.add (/root/node_modules/noobaa-core/node_modules/ws/lib/Receiver.js:139:10) at T LSSocket._ultron.on (/root/node_modules/noobaa-core/node_modules/ws/lib/WebSocke t.js:142:22)

guymguym commented 6 years ago

Great. So the issue is that we try to return 1512058633174.4548 for .nodes[2].data_activity.time.end which should be integer (idate). I will fix it from float to integer and it will be solved. Thanks