nknorg / nkn-sdk-js

JavaScript Implementation of NKN Client and Wallet SDK
Apache License 2.0
43 stars 17 forks source link

Failing to connect with multi-client and getting RPC errors #99

Closed basedwon closed 1 year ago

basedwon commented 2 years ago

I'm getting a whole bunch of errors intermittently -- it will work for awhile just fine and then sometimes it'll throw all these RPC errors.

I'm not sure I should be seeing the RPC call failed error, should I? I'm assuming this is related to the mainnet server?

I am using TLS, and I'll usually get a few failed websockets but it still connects and works - but then this problem arises and usually can't connect to any nodes at all.

image

basedwon commented 2 years ago

It seems to have gone away again, it lasted for about 5-10 minutes and that was probably the third time it's happened today (in the last 10 hours or so)

basedwon commented 2 years ago

And now it's happening again. This makes a reliable connection very difficult. I can't have users reloading the page until there's fewer errors, or waiting some time to try again (

basedwon commented 2 years ago

It seems to be getting worse -- as in happening more often and for longer. This basically makes NKN unusable

basedwon commented 2 years ago

Yeah, this is a total blocker - makes transport completely unreliable -- I might have to scrap months of work building my app with NKN if this isn't fixed.

basedwon commented 2 years ago

Just one thought, it could be a CORS issue -- I got this warning:

image

But that would be weird since it works sometimes and sometimes it doesn't. I'd assume anyone trying to use the SDK right now would have this same problem. I've switch VPN IP's and turned off the VPN -- nothing to work. This seems like it's on the NKN side.

basedwon commented 2 years ago

Ok, maybe I'm sending too many requests to the server? Finally just looked up error 429. If so, how long do I have to not send requests? Is each client counting as a request? I'm using the multiclient and I have to use a large number for numSubClients because the TLS connections are very inconsistent. But then sending 8-10 requests each page reload could trigger your DDoS system. Although I thought you had told me there was no DDoS prevention as it's E2EE. image

Please help me resolve this, I was hoping to launch my app in the next day or two and now this has stopped me from even being able to test - let alone reliably run an app where people can connect.

yilunzhang commented 2 years ago

This is because the official seed node were effectively being DDoS'ed since a few hours ago. Currently there is a RPC rate limiting on NKN node (a few thousand requests per second per node), and node will return http 429 once the limit is reached to protect the node from overload.

We are currently trying to see what we can do to help fix this. At the same time, what you can do is simply switching to another nkn node as rpc node (rpcServerAddr in options). Any nkn node whose port 30003 is open can be a rpc node, but in general it works best if you use your own node for reliability.

basedwon commented 2 years ago

Ok, well that's good it wasn't me, I'm not sending thousands of requests. And this sounds fixable, so that's very good.

Is there a way to catch that error so I could auto switch to a new seed node? And you say that any seed node will work? Like could I gather URLs from previous successful connections and then use those when needed?

And also is there a way to just run a seed node? I haven't even delved into setting up a miner yet, was trying to leverage the network purely from a SPA style app.

basedwon commented 2 years ago

Yeah, it seems that onConnectFailed just returns undefined so there's no way to know what error code it's failing on. And also, I tried a few other seed node URLs and none seemed to work. I did wonder if this might be a problem, having the only solid alternative transport solution, you're going to have bad actors trying to shut it down. Let me know what I can do to help @yilunzhang, I've invested much time into using NKN for my apps, and I believe it's the future internet.

basedwon commented 2 years ago

Any update on this @yilunzhang? I'm not an expert in this area, but it seems like it would be good to setup some sort of IP based throttle limit rather than giving everyone the 429 errors. I know I've done this with Traffik as a reverse proxy with IP based rate throttle.

basedwon commented 2 years ago

Another strategy I've thought of could be to put a table of alternate RPC addr into the blockchain, allow nodes to list their addr and then the SDK could auto switch to an alt addr if the mainnet fails. And then maybe bake in some degree of IP throttling to each node and I think that would make it mostly DDoS proof.

yilunzhang commented 2 years ago

So we have improved the official seed node anti DDOS strategy. Looks good so far but definitely can be further improved in the future. Per IP throttle might help but it alone is not enough for sure. Stats shows attacker has a large IP pool, and each IP doesn't have a crazy amount of requests.

In general, for p2p networks, just don't rely on a fixed list of seed node, regardless if it's on blockchain or somewhere else. Those are basically stationary targets for attackers. The advantage of p2p networks is quantity. Since every node in the network can be the seed node, one of the best strategy is that each client (not each App) picks its own seed node (either randomly, or based on performance). As a simple example, each NKN multiclient can keep a record of the nodes it was connected to, and use those as seed nodes next time. Both our nMobile and tuna based apps have similar mechanism. Check this for an example. As best practice, we measure latency before using those nodes as seed so we can use the fastest node to client side, and avoid nodes that are offline or syncing.

So just to be clear, seed nodes are only used when client is created and wants to join the network. Once it's in they are no longer needed. So they are definitely important, but only as an entrance. With the above mechanism, dapps are truly free from single point of failure, otherwise they still suffer from the bootstrap problem.

basedwon commented 2 years ago

It seems to working from my end as well. Great job!

I was actually already thinking to store rpc node addresses to use for connecting. Is there a way you could add the 429 error to the connect failed callback so I could auto switch if needed?

Looking through this tuna code, I don't quite understand how you're testing for the latency. Are you just creating a client with that addr and then measuring the delay or whether if fails? And are you testing every rpc address every time the client connects?

yilunzhang commented 2 years ago

I'm not sure if adding http 429 to sdk is a good idea because http 429 is not part of the protocol (yet), but something a node owner does to protect his node. He doesn't have to use http 429. For example, he can put a nginx in front of port 30003/30005 and let it handles rate limiting, or he can even use services like CloudFlare for anti-ddos protection.

For latency measurement, basically we just make a rpc request to each potential rpc node, use those who are in persist finished state and sort them based on response time, as you can see here https://github.com/nknorg/tuna/blob/master/util.go#L227 . We only measure once when tuna starts, and use the results throughout the tuna lifecycle.

basedwon commented 2 years ago

Ok, yeah that makes sense about the 429. But it seems like finding the favorite nodes could be a good work around. So if I create a multiclient, I'll get one or a few rpc nodes. But it doesn't seem like you get as much info in the js sdk as the go sdk? And then what call should I make to the rpc? Like getLatestBlock, looks like you're using nkn.GetNodeStateContext in tuna?

In this case, a lifecycle would be between every browser refresh -- I can use local storage for persistence of the rpc addresses, and so I could do a latency test and then sort them in the db. Or is it so dynamic that this kind of data would get stale quickly?

yilunzhang commented 2 years ago

We just published v1.2.6, which added rpc.getNodeState method. Moreover, rpc.rpcCall method is exposed, so you can call other RPC method really easily, like nkn.rpc.getNodeState({rpcServerAddr: 'http://ip:30003'}) is equivalent to nkn.rpc.rpcCall('http://ip:30003', 'getnodestate')

In this case, a lifecycle would be between every browser refresh -- I can use local storage for persistence of the rpc addresses, and so I could do a latency test and then sort them in the db. Or is it so dynamic that this kind of data would get stale quickly?

Every browser refresh sounds a bit too much to me since each test is gonna take some time, so some sort of persistence might be better for user experience.

Node info should be pretty stable if nodes in network are relatively stable, but even if they do change it's probably not an issue since we can use official seed node as backup if none of the stored nodes work.

basedwon commented 2 years ago

Ok, excellent - I see the PERSIST_FINISHED syncState like you mentioned. So, perhaps I should use the onConnect callback to get the node addr and then store that and then accumulate a few nodes before running a latency test and then sort by response time. If I get a connectFailed, I can put that node at the bottom of the list. And maybe retest after some interval.

One question, I assume that if I connect to the node with tls that node must have tls enabled and so that node would always accept tls?

yilunzhang commented 2 years ago

Just in case you don't know, you can use client.node (https://github.com/nknorg/nkn-sdk-js/blob/master/src/client/client.js#L61) to get the node info of a client.

One question, I assume that if I connect to the node with tls that node must have tls enabled and so that node would always accept tls?

Yes, node side need to configure tls correctly (port, firewall, cert, etc), and client side has no way to know whether node has everything set up correctly without trying to connect

basedwon commented 2 years ago

Ok, that is good to know, thank you. And so once I find the fastest responding node, I just put that as the rpcServerAddr of a new client and it will connect me to (maybe the same) nodes for my multiclient. Not really a question, just making sure I've got it.

yilunzhang commented 1 year ago

Closing this issue for now as it seems resolved. Feel free to continue the discussion or reopen it if you still have issues