mikedilger / gossip

Gossip is a nostr client
Other
627 stars 74 forks source link

DM issues #789

Closed mikedilger closed 1 week ago

mikedilger commented 1 week ago

NIP-17 DMs are not showing up for me.... I only see my carbon-copy, and BTW it does not have any "seen on" data meaning the carbon-copy was not sent to chorus, or was not pulled from chorus.

NIP-17 DMs I send are not showing up for the recipient.

Also, cloud fodder @jeremyd ran some testing and came up with more issues which I summarize as:

  1. Onboarding is missing setup for DM relays
  2. Gossip needs to read DMs from DM relays even if they aren't marked as read or write relays.
mikedilger commented 1 week ago

https://gist.github.com/jeremyd/6274c83dca409ee548cfc1af518274c7

mikedilger commented 1 week ago

Multiple fixes for this. I'm not sure it's working yet as we still need to test further. But so far I have fixed:

jeremyd commented 1 week ago

One extra bit of info I just discovered. The auth relay has a really agressive disconnection going on at 120s. I will be working to fix this as it should not be disconnecting for the duration of a session. This was due to the PING/PONG getting swallowed up by the interceptor proxy I'm developing and haproxy thinking the socket needs closing. This may or may not affect gossip client, but it's a good stress test for sure as when the socket gets closed by the server, clients have to re-connect, re-auth, re-subscribe. Just an FYI.

mikedilger commented 1 week ago

I doubt that the timeout is related. Gossip should be connecting, dropping the giftwrap, and disconnecting within probably less than a second. I need to debug a few more [gossip<->chorus ] issues this morning, but then I'll switch over to debugging connections to auth.nostr1.com

BTW: chorus not only does aggressive disconnection, it IP blocks every client for at least 1 second after the disconnection to prevent diconnect-reconnect-disconnect loops, and sometimes for much longer if the client wasn't behaving normally (scraping, etc)

mikedilger commented 1 week ago

Ok, based on debug logs, I posted a giftwrap to wss://auth.nostr1.com/ but I don't think it responded with an OK.

Our side initiated a websocket close immediately after we sent the EVENT. I wonder if that might race against processing the messages that were sent prior to the websocket close. Just a thought.

What I sent was this. You can perhaps debug with it and verify it is a giftwrap that works.

["EVENT",{"id":"1246fd62b22d22fc25386bfa42f7f2b9fc043a940097e393ed4dfe9affbc9cb8","pubkey":"158c14a7dd8590b0254a8bd66d08b505bb94d0c8b16e8d44185c5e2f49e46b3a","created_at":1718793703,"kind":1059,"sig":"ee5a5adc66d7f839a26f4adcc5a382a6f08ca09bcb7873fa76a0e28c73df4721b0c1b0a13f59524b72fc61e35fa6837276688711f9b69952a8514b264974239d","content":"Astw+MTy+8xiO7IK5qI5ynYRdWBEpoe0vjBxIP4OpP0dm+6BIKiBYsDJrVyg+Dp3Ppki0VeOr42tBYNyVGZrx6Vlrpm3QFPqDxAE1rM82HPGrEgsPz7n0HwvZrJnpbI5vHAahJNk7gU4WV3neN1PnVAQr3FmLXFk16N/MpkgPBczYjCrUydj/TtzZKBid8HrLahuBcK4RTdpn7ndXZliST+N0ICUXAGyS5zKeE8uXJ14tvRk9RMK3dFycoZXy9yKK15CRs2C+fyx3NNpJLpK+BFPBZztVgKAr5A/FEOPs/6NbjUflwhO1nH8NIjUO9oA/ZnrY15WuS9Fi+keUSoitzxtFEcwHN+e/8SL/kpV3WtN/C11TFMfH1wVU8x4GYKBTicX2MJRvwLtBMwMM3znyNGrAZ+MMV+gWhWugKQj6CFLqsjrluDjzdv7qlxFg05pFH1ioYHCx1EEN3nDPAd2XEAowI5Gaqh/9ZGs5DRVBrkpAeuFyLoH9RjPpSgqB4/DBHEQqZ470LIjbehsYz+Vze8zY7BPSr9VmvG4xRgIhufE30GDs3Wo4p8TjdV+EUPRc6wMOIton0KBd97iLlLN5Qu1uMtIu11bVgaCGo+1fbW9+wDvYL0/0+K8qjLQOZfPpbKuJN7bMaLaccaXsYC5VBDCdL7jkwv/eK4IgaaMPVX0Evq8F9Fu6Q3fXHgjPn6QMq5dqA1npj/LQr7kHu1hhDKiT5X5/+tBMlY3ycvGrr09cTFrX8QE6jCMIA86uVMJrQMZqFX5mORo4O8bMl9UrRJ7NgAaYx33pYeBIkupECqy9KPaIqQXSYsE4VD92Nb5rABtG8DIChycDDwiAO7OB0F2gsFUOHprID/V+rdCCTEspeqB2fhn0RKuS/WifBaXkMEtKcsxUMUuj0BlyQWWWSz1mJjOC2nBkcVm61K/A64CnAemBp8LPylEaWFGgVjV5vClBhK8XNcSG/rK7hGgs03isQHq+/8ZXfAtZ2RRGzE+hPkT4992UzQB4CavLiqq9AlKOxTCyWD7eySRW2H+QXwo1ckN19KlHKyEUNNR7MqkLKCmxdnUQXoT8IY2EsQEsHe/l/Y+O+K3nBZxgefqx/qOhaRYGhyUaw49rSooj23p+qO6uKKVZTA5sT3jINcMhf4lOLJCWfT+bm3tGgwBzpz94B/rf/zliFFfWymscHSiL2ThJW5cfAsH7M+98+HFL0ck9LndPwxmpyAmZ5PFZ9QliYFwxqrwUfZd3s7UUweosWCFAH0rAJBRUg8AjmY+Z4Z61tmtdA9L8UlPXVi1BDaIwTP8TAGMzDKor8wkyXqUiPntionvvoGGoWhgRdmjPSBP2iXJbObvE3UveXxvdQ+53IEHZ8oxPCj0kVE/+L0f+a/NaeqXTJGyM0fybcBXXGQvK6KBt06yIg7uzDb27j7B/XGj1mRE3cWqBs6T7YuFncOBPn8kCBF6BZLwVrwLCZKg0hY+70Oxj2HYiqMzZtnhQg4vZdovMXlrJUANbkvG0X/Yi4/GfTBAlWjt8qRDhAwSNeGntzGkUo1c1D0PG8RP7rU6ZveWpPVPh/66TwCUsadMBkIWTznGjtm2R4yPuNQot1zJvVpp9/7f53vO/y8UrlDDQcLCDFMl/JhWOqxI9GAdEVIMcGYoqXLSinsWkiQcIwNdAF/VgG1M8mjxrQrN5MmdqTcXGea7Iu/2iO79ib9d34KKSi7asgCHwC8Cz6IQOp0kLgNGDvfn4g2ygvIw0TBFAmGjarn8QligjoOCg+FucUaXO2F1v8K95raShDMA","tags":[["p","7cc328a08ddb2afdf9f9be77beff4c83489ff979721827d628a542f32a247c0e"]]}]
mikedilger commented 1 week ago

I'm working on code to not drop the connection so quickly...

mikedilger commented 1 week ago

OK I fixed my code, and I notice a problem on your relay.

I send an EVENT and then you AUTH challenge me without sending an OK false auth-required. So I reply to the AUTH and I get an OK true for the AUTH event... still never any OK for the initial event.

jeremyd commented 1 week ago

Hm, yes I know somewhat what's happening there.. So, the way the proxy is working right now is (given my understanding of reading the NIP), when the socket opens, it immediately will send ["AUTH", "challenge"]. Then it starts listening for an AUTH response. If it sees any "REQ"s during this time, it will re-send the ["AUTH", "challenge"] followed by ["CLOSED", "req#", "auth-required: you must auth"] for each REQ followed by an ["EOSE", "req#"] for each of them..

However, I do have a bug where it only needs one to succeed, then it dumps you into strfry. Strfry does not understand auth and will send NOTICE bad req back for any subsequent auth attempts. I think I am going to have to suppress these, some clients seem fine with it (like amethyst) but yeah, i can see that being a potential problem.

What is the OK false auth-required thing? Oh, for when you send an event pre-auth? Gooooooood point. I am not even responding to these prior to receiving the initial auth.. I can modify this to look for event pre-auth, and send the OK false auth-required similar to REQ. Was this in the NIP? I must have missed it or just wasn't thinking of this case. Thanks for pointing that out.

jeremyd commented 1 week ago

Ok, I've modified the proxy to send back the proper [ok ID false auth-required] response. I will think on this further as to how many of these auth challenges I should be sending, they're all the same challenge string (per connection) but maybe it's a bit much.. Regardless, I hope the OK false thing will help un-stick the current bug, thanks again for this.

jeremyd commented 1 week ago

Updated gossip, and performed the testing steps again with my gossip test accounts and I had a successful DM exchange between the two! Confirmed, this is working now :tada:

mikedilger commented 1 week ago

A while back in some other convo @fiatjaf indicated that relays ought to send AUTH immediately as their first message before processing even client messages. That way clients have the AUTH challenge and can save that and use it later if they find out they need to (via an auth-required error).

If I get an auth-required error prior to being AUTH challenged, the client is stuck...the relay has to initiate AUTH. If they say OK auth required and then immediately send an AUTH with challenge, there is still that little window where the client is stuck... I think gossip is handling that ok but it is nicer to just send AUTH right from the get-go.

jeremyd commented 1 week ago

The proxy does send auth very first thing, immediately on connect.. it's just that the socket is bi-directional.. most clients send first thing so the proxy then responds to each of those that were sent until they auth. so it sends the closes, and the ok/false and a bunch more auth challenges each time.

mikedilger commented 1 week ago

Ok ok. Maybe I'm doing that, sending EVENT straight away and then noticing the AUTH. Thanks for helping me debug and fix my client.

jeremyd commented 1 week ago

Yeah, I mean it makes sense that before auth, there wasn't much reason not to connect and send, and if you don't already know the relay needs auth then it's harder to build the async process to wait-and-see on that first message exchange .. so it's really nice that you helped me find this bug in the relay proxy. I knew there had to be some! :) A valuable debugging session for us all, thank you!