probe-lab / network-measurements

MIT License
50 stars 13 forks source link

PL Websites not continuously pinned at PL pinning cluster or Fleek #55

Open yiannisbot opened 1 year ago

yiannisbot commented 1 year ago

Context: ProbeLab is monitoring the uptime and performance of several PL websites at: https://probelab.io/websites/ Those sites are pinned in two stable providers (among other nodes that decide to pin these sites in the P2P network): i) PL's pinning cluster, and ii) Fleek's cluster.

One of the things we're monitoring is whether those stable providers are continuously making those sites available.

Assumption: We've worked closely with both the team that operates PL's pinning cluster and Fleek to make sure everything is in place and correctly configured (e.g., all nodes are running the Accelerated DHT Client) to reprovide the CIDs for the websites, so we've been expecting the situation to be rather stable. Stable here means websites are pinned to 7 nodes from PL's pinning cluster and 2 nodes from Fleek's fleet.

Results: Our results are presented under each website's results page, e.g., https://probelab.io/websites/blog.ipfs.tech/#website-trend-hosters-blogipfstech for https://blog.ipfs.tech.

This is a tracking issue for the resolution of the situation. Tagging @gmasgras and @cewood for the PL team and will propagate further to Fleek folks.

gmasgras commented 1 year ago

Other than the gap from July 24th to the 27th, it looks like things have been relatively stable lately, unless I'm reading the graphs wrong. I'm assuming that the 6 "reachable unrelayed" providers would be the collab cluster nodes in this case.

Is the main issue with the examples you posted that there are in some cases < 6 green (reachable unrelayed) providers available ?

gmasgras commented 1 year ago

Did a bit more digging into the problematic sites and their latest CID is pinned on all of the collab cluster nodes.

dennis-tra commented 1 year ago

The gap between the 24th and 27th of June is on me. I did a deployment (on Friday, doh) that didn't really work -.-

However, there is e.g. https://probelab.io/websites/blog.libp2p.io/ that used to be pinned but now doesn't seem to be anymore.

website-trend-hosters

At the same time other sites, like probelab.io seem to be pinned just fine:

website-trend-hosters (1)

gmasgras commented 1 year ago

:thinking: blog.libp2p.io is also pinned on all nodes

root@collab-cluster-am6-1:~# sudo -u ipfs ipfs resolve /ipns/blog.libp2p.io 
/ipfs/QmXQTUq8juy1eXEcVEYHaD5cRJcoKCJiFGQeYq34cQbmqf
root@collab-cluster-am6-1:~# ipfs-cluster-ctl --force-http --host "/ip4/127.0.0.1/tcp/18201" \
  --basic-auth "$CLUSTER_AUTH" status QmXQTUq8juy1eXEcVEYHaD5cRJcoKCJiFGQeYq34cQbmqf
QmXQTUq8juy1eXEcVEYHaD5cRJcoKCJiFGQeYq34cQbmqf | blog.libp2p.io__2023-06-28_100011:
    > collab-cluster-am6-3 : PINNED | 2023-06-28T10:00:11Z | Attempts: 0 | Priority: false
    > collab-cluster-am6-2 : PINNED | 2023-06-28T10:00:11Z | Attempts: 0 | Priority: false
    > collab-cluster-sv15-2 : PINNED | 2023-06-28T10:00:11Z | Attempts: 0 | Priority: false
    > collab-cluster-dc13-1 : REMOTE | 2023-07-11T12:55:25.102789461Z | Attempts: 0 | Priority: false
    > collab-cluster-am6-1 : PINNED | 2023-06-28T10:00:11Z | Attempts: 0 | Priority: false
    > collab-cluster-dc13-2 : PINNED | 2023-06-28T10:00:11Z | Attempts: 0 | Priority: false
    > collab-cluster-sv15-1 : PINNED | 2023-06-28T10:00:11Z | Attempts: 0 | Priority: false
gmasgras commented 1 year ago

Provider config looks like this

"Provider": {
    "Strategy": ""
  },      
  "Pubsub": {             
    "DisableSigning": false,
    "Router": "" 
  },                   
  "Reprovider": {                       
    "Interval": "12h"    
dennis-tra commented 1 year ago

I just checked blog.libp2p.io:

$ ipfs name resolve blog.libp2p.io
/ipfs/QmXQTUq8juy1eXEcVEYHaD5cRJcoKCJiFGQeYq34cQbmqf
$ ipfs routing findprovs /ipfs/QmXQTUq8juy1eXEcVEYHaD5cRJcoKCJiFGQeYq34cQbmqf
12D3KooWDTmCz7qkQuzAxBEft4HcEWPgRHDLEHZGz6YKsJnRTFNc
12D3KooWDgnSUjXtXL7JU437665k6GkMJRjHs4uFLsTPnhXfUhdk
12D3KooWDkSenw8Z7MHfL2c2ip436Jsz4e5P69xijuMVZcYmFxNq
12D3KooW9qbZ6ko6SSr3yyrt1bbucqGc23ku5K7nk5Up8dRfxZAW
12D3KooWA2F1A63fuF16J4FFZMeeXSHzEqjwS9x6Zs4zyWMG5JZ4
12D3KooWAMKSVLkRTZpauaBkZJNCt2pkFRiuEFn28R2smtvPVL8i
12D3KooWAUK9oejibW91nJh6ajjDULChHzhwCRyqvcabhNMXr3yY
12D3KooWAUZy7NDTVvkkwA3Qg7Pak9K1iWLSAwayEdrz3dxnagAZ
12D3KooWAuE4xwNiLajKiq3yhiweRkpVVhDe6sqN3h9mZAKU1bwH
12D3KooWAumZUc68wVWhJp737BzMqCB4sKEeCvvh9aoJsVFCppb6
12D3KooWBJyzYumBQYsvA8iTV4jZYNuYdmTfhzztkQ6GJacgxMvU
12D3KooWBhZjs33i9MUjdWfTozsiDq7JAvqaMwZwVwMAcSNJaSQy
12D3KooWBiUCqB5K51MrDMUjn3YvASvs8WrFuutcP33aUE2V8KJg
12D3KooWC3ve1wzU64UiRA9dE35uvBVp3jkFaV6NUzzdrfrqhud9
12D3KooWC7hAaknyyeWZRSbvA2HRV6xW1LgKhh23ojDm1yWmTFeG
12D3KooWCEtrxiodNdVhyTtjYhVNDgT326EHYnSKSQV1ZjZUSLAa
12D3KooWCHEdRCjJf7ZA26RuE16t8c38VivQvcBTbQv7hJtaCZzU
12D3KooWCdncm6aotvRkRWmvG43VjqrkzWqAkyvfiahzkiHijQWE
12D3KooWCZryvRGicga21MdGK6n1YefLvGb6ZHMQpVFc4dBifNfp
12D3KooWCasTMkCj4P1UTUXQRBs8Hq9DqHQksbGA6mQa6D32gpzV

None of the above PeerIDs match any pinning cluster ID :/ This is our list.

yiannisbot commented 1 year ago

There are many other sites that I see we've got problems with:

One thing that I don't understand is why behaviour is so inconsistent - maybe @gmasgras has got any ideas? For instance, the following sites seem to have only one provider the last few days. There is still a stable provider, but not sure why there's only one and not 6, for instance. Also, it's weird that Fleek presents similar inconsistency, which makes me think it's a more general issue with the setup (?)

gmasgras commented 1 year ago

Ah I see now that I should have been looking at the "known stable providers" graphs.

I'm not sure why some CIDs are provided while others are not.. they're all pinned to these nodes.

cc @lidel @aschmahmann any hints here ?

yiannisbot commented 1 year ago

One thing I'm thinking we could try out is reducing the reprovide interval, which is currently set to 22hrs. We've changed this according to our study (https://github.com/protocol/network-measurements/blob/master/results/rfm17-provider-record-liveness.md#44-State-of-PR-Holders-over-time), but I'm wondering whether something's wrong there. Any chance you can change that back to 12hrs for just some, or all of the cluster machines to see if they're constantly providing @gmasgras?

gmasgras commented 1 year ago

The Reprovider interval is already set to 12h. https://github.com/protocol/network-measurements/issues/55#issuecomment-1630783364

yiannisbot commented 11 months ago

The situation still doesn't seem to be solved and in fact it's only getting worse from what I can see at https://probelab.io/websites/ for various websites. The only website that continuously has several providers from our pinning cluster and Fleek is probelab.io 🤔

Two ideas come to mind to dive a little deeper into this:

cewood commented 11 months ago

George is out on parental leave for all of August. @mcamou can you please have a look at this and get back to Yiannis?

mcamou commented 11 months ago

@yiannisbot I've created a list of everything that we have pinned in the collab cluster as of right now, I've added it to the collab cluster under CID QmQzPkWhAro4mj2KTZPfskx1nTvxnKU7MmEjLSfHqxSFhY and verified that it's retrievable at https://ipfs.io/ipfs/QmQzPkWhAro4mj2KTZPfskx1nTvxnKU7MmEjLSfHqxSFhY. Most of the entries have annotations to indicate what they are.