threefoldtech / tfgrid-sdk-ts

Apache License 2.0
4 stars 8 forks source link

🐞 [Bug]: Is ssd and hdd filter working correctly #2879

Closed A-Harby closed 5 months ago

A-Harby commented 5 months ago

Is there an existing issue for this?

which package/s did you face the problem with?

grid_client

What happened?

I have chosen a 50 GB disk size and 0 rootfs and passed these values to the filter as sru, but the filter returned to me a node with only 40 GB.

I tried it a few times, and the result is the same.

Steps To Reproduce

No response

which network/s did you face the problem on?

Dev

version

dev branch

Twin ID/s

No response

Node ID/s

No response

Farm ID/s

No response

Contract ID/s

No response

Relevant screenshots/screen records

image image image

Relevant log output

yarn test -- modules/applications/funkwhale
yarn run v1.19.0
warning From Yarn 1.0 onwards, scripts don't require "--" for options to be forwarded. In a future version, any explicit "--" will be forwarded as-is to the scripts.
$ jest modules/applications/funkwhale
  console.log
    Credentials not all found in env variables. Loading all credentials from default config.json...

      at Object.<anonymous> (tests/client_loader.ts:16:11)

  console.warn
    2024-06-05 14:34:01        API/INIT: RPC methods not decorated: chainHead_unstable_body, chainHead_unstable_call, chainHead_unstable_follow, chainHead_unstable_genesisHash, chainHead_unstable_header, chainHead_unstable_stopBody, chainHead_unstable_stopCall, chainHead_unstable_stopStorage, chainHead_unstable_storage, chainHead_unstable_unfollow, chainHead_unstable_unpin, transaction_unstable_submitAndWatch, transaction_unstable_unwatch

      at apply (../../node_modules/@polkadot/util/cjs/logger.js:62:22)
      at Object.warn (../../node_modules/@polkadot/util/cjs/logger.js:131:14)
      at ApiPromise._filterRpcMethods (../../node_modules/@polkadot/api/cjs/base/Decorate.js:342:9)
      at ApiPromise._filterRpc (../../node_modules/@polkadot/api/cjs/base/Decorate.js:305:10)
      at ApiPromise._metaFromChain (../../node_modules/@polkadot/api/cjs/base/Init.js:356:10)
      at ApiPromise._loadMeta (../../node_modules/@polkadot/api/cjs/base/Init.js:284:229)
      at ApiPromise._onProviderConnect2 (../../node_modules/@polkadot/api/cjs/base/Init.js:429:21)

  console.warn
    2024-06-05 14:34:01        API/INIT: Not decorating runtime apis without matching versions: TransactionPaymentApi/4 (1 known), Metadata/2 (1 known)

      at apply (../../node_modules/@polkadot/util/cjs/logger.js:62:22)
      at Object.warn (../../node_modules/@polkadot/util/cjs/logger.js:131:14)
      at ApiPromise._decorateCalls (../../node_modules/@polkadot/api/cjs/base/Decorate.js:538:9)
      at ApiPromise._createDecorated (../../node_modules/@polkadot/api/cjs/base/Decorate.js:183:26)
      at ApiPromise._injectMetadata (../../node_modules/@polkadot/api/cjs/base/Decorate.js:219:14)
      at ApiPromise._initFromMeta (../../node_modules/@polkadot/api/cjs/base/Init.js:382:10)
      at ApiPromise._loadMeta (../../node_modules/@polkadot/api/cjs/base/Init.js:285:17)
      at ApiPromise._onProviderConnect2 (../../node_modules/@polkadot/api/cjs/base/Init.js:429:21)

  console.log
    { zos: '0.0.0-02480a9', zinit: 'v0.2.11' }

      at log (tests/utils.ts:12:11)

  console.log
    Start creating the machine deployment with name fwl3n7a55fs6

      at EventEmitter.logsHandler (src/helpers/events.ts:6:11)

  console.log
    Adding node 14 to network b70o9w1jk7gwwhy

      at EventEmitter.logsHandler (src/helpers/events.ts:6:11)

  console.log
    Node 14 reserved ports: [31524,2593,4082,7878,31870,3201,26321,5671,2597,16490,5865,2283,7337,9945,10749,16123,9943,9944,15746,3596,15004,6906,22161,8754,3181,7393,5399,11176,1276,7433,9847,25742,29938,443,2043,9761,10847,3445,2939,6520,1720,29579,4400,29276,19827,5209,24950,10874,20536,8007,20789,5348,21082,5983,6660,2434,3968,10890,4512,2150,5868,6997,7416,9105,8398,7255,2707,5945,80,2515,5701,300,8082,21867,4076,4188,24246,7950]

      at EventEmitter.logsHandler (src/helpers/events.ts:6:11)

  console.log
    Generating peers for network b70o9w1jk7gwwhy

      at EventEmitter.logsHandler (src/helpers/events.ts:6:11)

  console.log
    Creating a vm on node: 14, network: b70o9w1jk7gwwhy with private ip: 172.26.2.2

      at EventEmitter.logsHandler (src/helpers/events.ts:6:11)

  console.log
    Merging workloads

      at EventEmitter.logsHandler (src/helpers/events.ts:6:11)

  console.log
    disconnecting

      at Client.disconnect (../tfchain_client/dist/node/client.js:134:21)

 FAIL  tests/modules/applications/funkwhale.test.ts (19.98 s)
  ✕ TC2685 - Applications: Deploy Funkwhale (11533 ms)

  ● TC2685 - Applications: Deploy Funkwhale

    DiskAllocationError: Cannot fit the required SSD disk with size 50.00 GB., on Node 14 with disk pools:
             SSD:  40.44GB  
             HDD:  2794.52GB ,2794.52GB ,2794.52GB 
        Please select another Node.

      566 |       return true;
      567 |     } catch (error) {
    > 568 |       throw new GridClientErrors.Nodes.DiskAllocationError(
          |             ^
      569 |         `${(error as Error).message}, on Node ${nodeId} with disk pools:
      570 |          SSD:  ${ssdPools.map(disk => (disk / 1024 ** 3).toFixed(2).toString() + "GB ")} 
      571 |          HDD:  ${hddPools.map(disk => (disk / 1024 ** 3).toFixed(2).toString() + "GB ")}

      at Nodes.verifyNodeStoragePoolCapacity (src/primitives/nodes.ts:568:13)
      at TwinDeploymentHandler._checkNodeCapacity (src/high_level/twinDeploymentHandler.ts:387:7)
          at async Promise.all (index 0)
      at TwinDeploymentHandler.checkNodesCapacity (src/high_level/twinDeploymentHandler.ts:331:5)
      at TwinDeploymentHandler.handle (src/high_level/twinDeploymentHandler.ts:512:5)
      at MachinesModule.deploy (src/modules/machines.ts:100:23)
      at MachinesModule.descriptor.value (src/modules/utils.ts:11:12)
      at Object.<anonymous> (tests/modules/applications/funkwhale.test.ts:124:15)

Test Suites: 1 failed, 1 total
Tests:       1 failed, 1 total
Snapshots:   0 total
Time:        20.034 s
Ran all test suites matching /modules\/applications\/funkwhale/i.
Jest did not exit one second after the test run has completed.

'This usually means that there are asynchronous operations that weren't stopped in your tests. Consider running Jest with `--detectOpenHandles` to troubleshoot this issue.
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
amiraabouhadid commented 5 months ago

was able to deploy using 50GB disk size image

A-Harby commented 5 months ago

was able to deploy using 50GB disk size image

Yes, deployment is working fine, but my point was, why would the filter result in a node with a 40 GB SSD when the SRU option was clearly 50 GB?

AhmedHanafy725 commented 5 months ago

I believe this is a race condition, someone else deployed a VM on this node before you by a few seconds and took the storage and it failed with you while deploying.

I checked the storage pool for this node and it has only one SSD disk, so it's not a mess calculation from the proxy side as it doesn't take that into account.

[
  {
    name: '5537bb63-21f0-4449-9ec9-f568b3fd48e8',
    type: 'ssd',
    size: 512110190592,
    used: 472729518080
  },
  {
    name: 'dccfc185-c8b1-4752-8dcc-783bfede133d',
    type: 'hdd',
    size: 3000592982016,
    used: 0
  },
  {
    name: '636952f1-3045-4dba-b686-7f4b796f543a',
    type: 'hdd',
    size: 3000592982016,
    used: 0
  },
  {
    name: 'd61ff74c-d10a-45c5-aedf-86b549098ce7',
    type: 'hdd',
    size: 3000592982016,
    used: 0
  }
]
A-Harby commented 5 months ago

I believe this is a race condition, someone else deployed a VM on this node before you by a few seconds and took the storage and it failed with you while deploying.

I checked the storage pool for this node and it has only one SSD disk, so it's not a mess calculation from the proxy side as it doesn't take that into account.

It could be a race condition indeed, and it's not reproducible for now, but since the issue happened to the grid client during automated tests, I will pay attention to them and report if it happens again.