threefoldtech / tfchain_graphql

Graphql for TFchain
Apache License 2.0
2 stars 3 forks source link

Added public ips on a farm are not always synced with the chain #157

Closed AhmedHanafy725 closed 2 months ago

AhmedHanafy725 commented 7 months ago

env: devnet

farm: 52 image

image

also, farm 4432 has the same issue

sameh-farouk commented 7 months ago

Update: Still trying to figure out what is causing this.

What I have found so far is that during an old tfchain migration, a state cleaning was performed to remove some bad IPs from the chain. This might have resulted in having more IPs than what we currently have in the chain (since the cleaning didn't emit any events). However, the OP mentioned that there are fewer IPs in the Graphql, indicating that there might be a different issue at play.

Omarabdul3ziz commented 6 months ago

i think we maybe have also a problem with deleting, some farms has ips on the database but not on the chain, what i noticed is farms 138, 33, 14 image image

sameh-farouk commented 6 months ago

@Omarabdul3ziz yes there is another issue but not with deleting, note what i was saying in my previous comment. There is indeed a two issues in play here, and the one you described is due to previous chain migration where we added a validation and removed invalid IPs info from chain state without propagating this to our processor via tracked events. a gateway always have to be public and to be on the same subnet as the IP address. In your case, these IPs shows in graphql and not exists in chain due to being invalid. A graphql fix should follow to remove these ips from graphql as well.

sameh-farouk commented 6 months ago

Update:

Farm 52 was created in block 1313143 with an empty set of IPs it was updated a handful of times in different extrinsic since then, but this one is the most interesting

https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Ftfchain.dev.grid.tf%2Fws#/extrinsics/decode/0x1908840060e6768d0f49b892e5d9e7f8247320687c28c2651c37e168c4a244c649ebe11c014e861a913967e5a97bffeadd0c7a8c834fe2cdc7ed8331079a991b0401c129651108390a6f00d97f2ef69ba9b30e88540ce3b03d53e1eacccf7bc171c78522892502c9440003024019053400000028312e312e312e352f32341c312e312e312e3019053400000028312e312e312e362f32341c312e312e312e3019053400000028312e312e312e372f32341c312e312e312e3019053400000028312e312e312e382f32341c312e312e312e3019053400000028312e312e312e392f32341c312e312e312e301905340000002c312e312e312e31302f32341c312e312e312e301905340000002c312e312e312e31312f32341c312e312e312e301905340000002c312e312e312e31322f32341c312e312e312e301905340000002c312e312e312e31332f32341c312e312e312e301905340000002c312e312e312e31342f32341c312e312e312e301905340000002c312e312e312e31352f32341c312e312e312e301905340000002c312e312e312e31362f32341c312e312e312e301905340000002c312e312e312e31372f32341c312e312e312e301905340000002c312e312e312e31382f32341c312e312e312e301905340000002c312e312e312e31392f32341c312e312e312e301905340000002c312e312e312e32302f32341c312e312e312e30

The same farm was updated about 15 times in the same batch extrinsic with different IP addresses each time.

The processor received all farm updated events. Because they were propagated from the same externsic and included in the same block, it processes the events in an incorrect order resulting in an outdated set of IPs to persist in processor db.

I expect the processor to process them based on their index within the block, but seems this is not in check.

sameh-farouk commented 6 months ago

Update: Based on @AhmedHanafy725 feedback, this batch extrinsic was initiated from the dashboard by adding an IP range. Will try to reproduce.

sameh-farouk commented 6 months ago

Update: I wasn't able to reproduce the issue. I tried adding an IP range from the dashboard, and it seems that events from batch call are being processed with respect to the event index in the block, just as expected.

It would be very helpful if you could reproduce the issue and share the steps with us.

For now, what i can do a migration to remove the invalid IPs from GraphQL.

AhmedHanafy725 commented 4 months ago

I tried it again adding 5 ips with batch call and nothing is showing in graphql image

while they are added on the chain image

sameh-farouk commented 4 months ago

@AhmedHanafy725 Which network?

AhmedHanafy725 commented 4 months ago

@AhmedHanafy725 Which network?

devnet

sameh-farouk commented 4 months ago

@AhmedHanafy725 There is a validation on the processor to not allow saving duplicate IPs and all these IPs you tried to add were stored on other farms previously, so it wasn't added to your farm.

you verify this using:

query MyQuery {
  publicIps(where: {ip_eq: "1.1.1.1/24"}) {
    ip
    id
    farm {
      id
      name
      farmID
      dedicatedFarm
      certification
      twinID
      pricingPolicyID
    }
    contractId
  }
}

It appears that the validation was intended to prevent the addition of the same public IP address to different farms. However, this should have been implemented at the chain level to prevent such an occurrence.

How should we approach this? do we need to alter this validation?

sameh-farouk commented 4 months ago

Now it's clear what caused these synchronization issues.

  1. IPs registered on the chain but not indexed by the processor (which is the main issue mentioned by OP) are explained here here.

  2. IPs indexed by the processor but not exists on the chain are explained here here.

sameh-farouk commented 4 months ago

Update: After discussing this with @xmonader and @AhmedHanafy725 I suggested a solution here

sameh-farouk commented 4 months ago

Update: a PR that handling this case https://github.com/threefoldtech/tfchain_graphql/issues/157#issuecomment-2021848765 is ready fore review