markqvist / NomadNet

Communicate Freely
GNU General Public License v3.0
1.23k stars 47 forks source link

Request Failed on serving pages #29

Closed faragher closed 1 year ago

faragher commented 1 year ago

On attempting to access pages from another local node, I get "Request failed" after everything seeming to function until "Request sent, awaiting response."

I can occasionally get index.mu, but none of the other pages will load. All have 644 permissions (except the Python script, which is 755) and seem to function when browsed from the source machine. LXMF works as intended, the system seems stable, and problems persist with both the autointerface and the Dublin hub. Direct file access (hash:/file/filename.ext) also fails.

Logs reveal nothing on either system (server at log level 4, client at 6, I don't think it's wise to cycle the main node to change the log level). I am uncertain if the pages serve properly to others, but can't get a node on an outside net to test.

Both nodes are on a subnet, and 4242 is routed to the main node. Both are configured to be propagation nodes.

Announce stream appears frozen from 12MAY (9 days ago) and logs from the 16th. Pages served increments properly. RNStatus as expected. Using server to browse client page worked flawlessly the first time, and now has similar "Request Failed" message. Client serving pages at log level 6 reports nothing at time of serving.

Please advise on how to find logs or similar information to determine the issue.

Server: (pip version) NN v0.3.5 RNS v0.4.9

Client: NN v0.3.5 RNS v0.5.3

acehoss commented 1 year ago

I have had similar issues. Acehoss Node runs on a different machine from my workstation, and after upgrading to RNS 0.5.2 or 0.5.3 (I don't remember for sure exactly which one the problem started with), I would get the Request Failed error when trying to connect from my workstation. Typically it goes like this:

  1. Upgrade RNS on both server and client.
  2. Restart RNS and nomadnet daemons on server.
  3. Start nomadnet on client.
  4. Connect to Acehoss Node. Page loads from cache.
  5. Refresh. Path request succeeds, request times out.
  6. Close window/connection and reconnect. Request Failed error.
  7. Close window and reconnect. Page loads.

I assumed it was just something wonky with my configs or something. I haven't had the time lately to dig into it.

My client connects to the RNS transport instance on Acehoss Node by TCP. That instance is connected to Testnet by Dublin TCP and i2p, Beleth TCP, Beyond TCP, and BBS TCP.

markqvist commented 1 year ago

Hmm, this is a bit strange. A backwards-incompatibility was indeed introduced in 0.5.2, but only for the Buffer and Channel objects.

Would it be possible for you to crank up the loglevel to 7 (both for rnsd and nomadnet) on the node that fails, and send me a snapshot of the log output after the request fails? Then I'll probably be able to spot what is up.

markqvist commented 1 year ago

Also, is it happening on static pages as well, or only on generated pages?

faragher commented 1 year ago

I am unable to find the rnsd logs. Are they stored by default?

NomadNet requester side:

Serving Server = f8ac9c41b857ddf13a9151cec211079b

[2023-05-21 12:51:01] [Debug] Found 35978db6776faa180756bcfcfdb862c525f4fe4e29558e91b6bac5265115df08_1684692510.2669919 in cache.
[2023-05-21 12:51:01] [Debug] Returning cached page
[2023-05-21 12:51:05] [Debug] Browser handling link to: :/page/uplinks.mu
[2023-05-21 12:51:06] [Extra] Valid announce for <Serving Server> 2 hops away, received via <693cf64436de88fdeb66cc29e842f249> on TCPInterface[RNS Testnet Dublin/dublin.connect.reticulum.network:4965]
[2023-05-21 12:51:06] [Debug] Destination <Serving Server> is now 2 hops away via <693cf64436de88fdeb66cc29e842f249> on TCPInterface[RNS Testnet Dublin/dublin.connect.reticulum.network:4965]
[2023-05-21 12:51:07] [Extra] Registering link <b54e01ad2088dedce4b80c8783338ac4>
[2023-05-21 12:51:07] [Debug] Link request <b54e01ad2088dedce4b80c8783338ac4> sent to <nomadnetwork.node.a22c2e8486a168a3b762d5a4b76a454f/f8ac9c41b857ddf13a9151cec211079b>
[2023-05-21 12:51:08] [Extra] Activating link <b54e01ad2088dedce4b80c8783338ac4>
[2023-05-21 12:51:08] [Verbose] Link <b54e01ad2088dedce4b80c8783338ac4> established with <nomadnetwork.node.a22c2e8486a168a3b762d5a4b76a454f/f8ac9c41b857ddf13a9151cec211079b>, RTT is 1.238s
[2023-05-21 12:51:13] [Debug] Path request for <78bee5a05945b568321b9b13a041cf0e> on TCPInterface[RNS Testnet Dublin/dublin.connect.reticulum.network:4965]
[2023-05-21 12:51:13] [Debug] Ignoring path request for <78bee5a05945b568321b9b13a041cf0e> on TCPInterface[RNS Testnet Dublin/dublin.connect.reticulum.network:4965], no path known
[2023-05-21 12:51:16] [Debug] Path request for <78bee5a05945b568321b9b13a041cf0e> on TCPInterface[RNS Testnet Dublin/dublin.connect.reticulum.network:4965]
[2023-05-21 12:51:16] [Debug] Ignoring path request for <78bee5a05945b568321b9b13a041cf0e> on TCPInterface[RNS Testnet Dublin/dublin.connect.reticulum.network:4965], no path known
[2023-05-21 12:51:17] [Extra] Valid announce for <5703f156822b7897553c85f8e6cb79c4> 3 hops away, received via <693cf64436de88fdeb66cc29e842f249> on TCPInterface[RNS Testnet Dublin/dublin.connect.reticulum.network:4965]
[2023-05-21 12:51:17] [Debug] Destination <5703f156822b7897553c85f8e6cb79c4> is now 3 hops away via <693cf64436de88fdeb66cc29e842f249> on TCPInterface[RNS Testnet Dublin/dublin.connect.reticulum.network:4965]
[2023-05-21 12:51:24] [Debug] Path request for <6561c9e03ec757b18ad9410fa0d39f24> on TCPInterface[RNS Testnet Dublin/dublin.connect.reticulum.network:4965]
[2023-05-21 12:51:24] [Debug] Ignoring path request for <6561c9e03ec757b18ad9410fa0d39f24> on TCPInterface[RNS Testnet Dublin/dublin.connect.reticulum.network:4965], no path known
[2023-05-21 12:51:31] [Debug] No active peers available, randomly selecting peer to sync from 1 unresponsive peers.
[2023-05-21 12:51:31] [Debug] Selected waiting peer 0: <78bee5a05945b568321b9b13a041cf0e>
[2023-05-21 12:51:31] [Debug] Initiating LXMF Propagation Node sync with peer <78bee5a05945b568321b9b13a041cf0e>
[2023-05-21 12:51:31] [Debug] No path to peer <78bee5a05945b568321b9b13a041cf0e> exists, requesting...
[2023-05-21 12:51:38] [Debug] Path request was not answered, retrying sync with peer <78bee5a05945b568321b9b13a041cf0e> later

Sadly, using that client as a server worked successfully, and restarting the NomadNet instance for the server with higher log levels has fixed the problem. I'll circle back with the verbose logs if the problem reoccurs.

Other data point is when I opened the log in NomadNet it seemed to write a handful of lines at a time instead of actually tailing the log itself, slowly outputting a week of logs that got less and less detailed. I'm not sure if this was intended behavior or a buffer problem.

Edit: It happens on both static and generated pages.

randogoth commented 1 year ago

I was having the exact same issue. Request failed after Request sent, awaiting response.

Happened on all pages beyond index.mu.

I noticed that when testing the pages locally on the server (via Node Info > Browse) all the links using the format :/somepage.mu worked flawlessly but would fail when connecting from the client. I changed all the links to :/page/somepage.mu and that fixed the issue.

faragher commented 1 year ago

Good catch, but all my links are already prepended by a /

Both systems at log level 7 but currently running fine. Uptime on failure was around 45 days, so if it's a long term issue, we have have to wait a while.

Correction: Uptime when I noticed the failure.

markqvist commented 1 year ago

Thanks, I'll keep an eye out for this too.

faragher commented 1 year ago

Okay, not sure if this is the same problem, since I know this is a known issue and I thought we fixed it, but today I had the problem where NomadNet doesn't see pages created while it's running. I find this a flaw and will look at it in the near future.

markqvist commented 1 year ago

It's not the same issue, but it is definitely a relevant shortcoming. NomadNet basically scans for existing files/scripts in the pages directory on startup, and registers those as requestable paths. If a new file is created in the filesystem, NomadNet will not see it until a restart.

This could be remedied either by occasionally rescanning the directory, or by using inotify (or similar) functionality from the operating system to get notified of filesystem changes to the directory. I'd prefer something like the inotify approach, as opposed to periodic scanning, but I haven't looked into it at all yet.

Regarding the issue this thread is referring to, I am pretty sure it was fixed in one of the recent releases, and was due to some timing issues. I'll close this one for now, but feel free to reopen if it crops up again.

@faragher, if you want to open an issue for discussing / working on the page/file issue, that would be great too.

markqvist commented 1 year ago

Closing as solved in previous release.