probe-lab / network-measurements

MIT License
50 stars 13 forks source link

Track and measure number of Brave browser IPFS nodes #24

Open autonome opened 1 year ago

autonome commented 1 year ago

Brave browser ships a feature which downloads and runs Kubo.

We want to measure the number of Brave IPFS nodes on the public network.

@lidel said they announce themselves as kubo/0.16.0/brave and that we could find them by:

aschmahmann commented 1 year ago

collecting peerids from peer records on DHT

This could be somewhat tricky. Collecting peer records from the DHT like that will be painful once the hydra nodes are spun down. This process seems to be starting soon https://discuss.ipfs.tech/t/dht-hydra-peers-dialling-down-non-bridging-functionality-on-2022-12-01/15567 and (given that the linked proposal is a protocol violation) hopefully the hydras will be fully dead soon.

In a hydra-less world some options include:

IIRC there are already some metrics collected for number of total peerIDs seen by infra nodes. This seems like a) a good time to audit those metrics for accuracy b) like it'd be easy enough to either export more information than just peerIDs or to take the peerIDs and feed them into ipfs id to get the results out

momack2 commented 1 year ago

@aschmahmann - collecting peer records has been done long before the hydras exist - and AFAIK the hydras actually don't collect this data at all anyway. This already exists.

aschmahmann commented 1 year ago

IIRC there are already some metrics collected for number of total peerIDs seen by infra nodes.

@momack2 the existing data collection is what I meant by the above. IIRC it comes from aggregates of PL nodes https://github.com/ipfs/kubo/blob/master/plugin/plugins/peerlog/peerlog.go, although I don't recall which nodes participate in the logging. I recall seeing results in grafana and kibana although I'm not sure if they're the same.

I was mostly flagging that the proposal of scraping the DHT will not really work, and without leveraging the hydras you'd need a different approach, like the one we already have.

This already exists.

Yep, what we already have is likely fine (scraping inbound peerIDs and user agents). Although when tracking and reporting these numbers we should be clear how we're collecting them.

momack2 commented 1 year ago

Yep! just want to clarify that the hydras don't participate in this, and it will not change with the hydras being turned down. 👍

yiannisbot commented 1 year ago

After a brief discussion with @dennis-tra he brought up the idea of the honeypot deployed for the NAT Hole Punching study being able to help with this. Indeed, the honeypot is acting as a DHT server and also makes itself known to others so that it can attract more clients: https://github.com/dennis-tra/punchr#honeypot. It's also running for a few months now, so information might already be there in the logs?

I've monitored the "Remote Peer Agent Version" reported at: https://punchr.dtrautwein.eu/grafana/d/F8qg0DP7k/punchr-performance?orgId=1 for a while, but haven't seen any kubo/0.16.0/brave peer showing up 🤔

@dennis-tra is the honeypot, or any of our other tools able to get this information?

dennis-tra commented 1 year ago

Yeah, I was thinking that we could use data from the punchr honeypot but I just had a brief look at the data and was searching for the brave agent version. There is no entry in the database that contains the substring brave :/

image image

However, I just fired up the brave built-in kubo node and requested its AgentVersion + supported protocols. This is the output:

protocols: [/ipfs/bitswap/1.0.0 /libp2p/dcutr /p2p/id/delta/1.0.0 /libp2p/circuit/relay/0.2.0/stop /ipfs/bitswap/1.2.0 /ipfs/bitswap/1.1.0 /ipfs/bitswap /x/ /ipfs/id/1.0.0 /ipfs/id/push/1.0.0 /ipfs/ping/1.0.0 /libp2p/circuit/relay/0.1.0]
agent: kubo/0.16.0/

You can see that the reported agent is kubo/0.16.0/ and not kubo/0.16.0/brave.

yiannisbot commented 1 year ago

Interesting! Thanks for digging in. So, good news and bad news:

@dennis-tra if we had an identifier (e.g., if they indeed advertised as kubo/0.16.0/brave, do you think we would get an accurate picture of the number of Brave nodes in the network? Do we miss anything that we'd need to track?

@lidel @autonome can you double-check with Brave what agent version they're advertising?

dennis-tra commented 1 year ago

I think there must be a way to extrapolate the number incoming connections to the whole network - perhaps based on the in-degree of the honeypot, which we could get from the crawls.

But this requires a bit of thinking 🤔

momack2 commented 1 year ago

Lidel flagged there was a bug on this, which will be fixed soon, so you should get real metrics shortly!

On Thu, Dec 8, 2022 at 6:46 AM Dennis Trautwein @.***> wrote:

I think there must be a way to extrapolate the number incoming connections to the whole network - perhaps based on the in-degree of the honeypot, which we could get from the crawls.

But this requires a bit of thinking 🤔

— Reply to this email directly, view it on GitHub https://github.com/protocol/network-measurements/issues/24#issuecomment-1342676606, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEXAFYJGAMLAIO6KYN7KQLWMHKATANCNFSM6AAAAAASNX6Y4U . You are receiving this because you were mentioned.Message ID: @.***>

autonome commented 1 year ago

Where Brave implemented this: https://github.com/brave/brave-browser/issues/18505

lidel commented 1 year ago

The suffix was not applied to the user agent exposed via libp2p identify protocol :see_no_evil: Fixed in https://github.com/ipfs/kubo/pull/9457 and will ship with Kubo 0.18 (https://github.com/ipfs/kubo/issues/9417).

We need to wait for Brave to update to Kubo 0.18 to see this. Good news is that they have reliable automatic update mechanism, similar to IPFS Desktop, so no other action is required on their end.

dennis-tra commented 1 year ago

Quick update from our side. With the dashboard from our collaborators, we see the following numbers for the last 7 days:

image

This shows an estimated number of around ~300 brave nodes for the last 7 days. This looks way too low to us.

Our honeypot component (which arguably only covers a tiny bit of the keyspace) shows the following numbers for the last 7 days:

image

This shows the number of unique PeerIDs that have connected to the honeypot with a */brave agent version on a given day. Also, not as many as we would have expected. We're currently thinking about setting up dedicated infrastructure ourselves to measure client activity.

yiannisbot commented 1 year ago

Just to add that the thinking here is to extrapolate from the portion of the key space that the honeypot is covering to the entire network. Of course, this assumes that there is a uniform distribution of requests across the key space, which isn't very accurate, but gives us a ballpark.

The infrastructure we want to build (that Dennis mentions above) is to have more nodes and cover more of the key space. From that we should get a much more accurate view of the number of */brave client nodes.

lidel commented 1 year ago

I know we don't have visibility/extrapolation for full network yet, but was there any relative change in data produced by our honeypot view since the May 2 (https://brave.com/nft-pinning/)?