sinamics / ztnet

ZTNET - ZeroTier Web UI for Private Controllers with Multiuser and Organization Support.
https://ztnet.network
GNU General Public License v3.0
409 stars 39 forks source link

[Bug]: clients are displayed for a very long time on 0.6.6 after update #435

Closed PhaNtomBek closed 1 week ago

PhaNtomBek commented 1 month ago

πŸ› Describe the Bug

After updating from 0.6.5 to 0.6.6, if you open any network, the clients are displayed for a very long time, you need to wait 5-30 seconds. If you open the "ZT Controller" menu, the data display takes just as long. When you reopen the same network (if you open the network immediately), all data is displayed well. But if you wait time, the whole problem repeats itself.

πŸ” Steps to Reproduce

Wait time, the whole problem repeats itself

πŸ”§ Deployment Type

✨ Expected Behavior

No response

πŸ“‹ ZTNET Logs

No response

πŸ–Ό Screenshots

No response

sinamics commented 1 month ago

I do not understand "IDs are displayed for a very long time". Could you explain further?

Out of curiosity, do you have many unlinked networks?

PhaNtomBek commented 1 month ago

I can show you via Discord if it's convenient for you.

PhaNtomBek commented 1 month ago

It would probably be correct to say that the list of clients that are connected to the network is not displayed. I dont have any unlinked networks.

PhaNtomBek commented 1 month ago

image I need to wait about a minute for clients to appear on the list. I rolled back to version 0.6.5 - no such problem.

Mhalkyo commented 1 month ago

Hello, I allow myself to react to this problem I recently used version 0.6.6 and I do not encounter this problem. Could it be due to a performance problem on the machine side?

PhaNtomBek commented 1 month ago

Hello, I allow myself to react to this problem I recently used version 0.6.6 and I do not encounter this problem. Could it be due to a performance problem on the machine side?

Hello, I use a lot of networks. If the network was created recently, then this problem does not arise. If the network was created a long time ago, then the problem appears. I cannot confirm this, this is my observation. Also, this may be related if there are a lot of clients on the network (30+). The problem has been noticed on different computers.

n9yty commented 1 month ago

I had been building a local copy of ztnet. With 0.6.6 I tried to use the published image and I encountered this problem. I then built it myself again and the problem was not there. I don't know what was going on, but it wasn't the host system as I had rebooted it to be sure and the CPU was idling on it. I assumed it was me or related to using a local built copy.

sinamics commented 1 month ago

I've attempted to replicate the issue but have been unsuccessful so far. It's unclear why version 0.6.5 works while 0.6.6 does not, given the minimal changes between versions.

https://github.com/sinamics/ztnet/compare/v0.6.5...v0.6.6

I will continue testing.

n9yty commented 1 month ago

I downgraded here to the last "official" image and it seems to be when the page is loaded or refreshed. Once the page is up I can change between organizations and networks and they populate quickly. If I refresh the browser page I get the delays. It is not directly related to the number of hosts but it seems longer on the larger networks. And as mentioned, if I let it sit for a little bit, and then try to switch networks, the delay comes back even though I'm not refreshing the page. I switched back to my local build and the issue goes away. There may be a very small delay but it is nowhere near what I see if I pull the "official" docker image.

I am running behind that caddy proxy if that makes any difference.

PhaNtomBek commented 1 month ago

Hello, I am running behind nginx.

PhaNtomBek commented 1 month ago

At the moment, the problem began to appear on version 0.6.5. It’s not so critical, but you need to wait 5-20 seconds for clients to display.

PhaNtomBek commented 1 month ago

At the moment, the problem began to appear on version 0.6.5. It’s not so critical, but you need to wait 5-20 seconds for clients to display.

Good day. We moved to another data center and changed addresses. A webhook was configured that tried to connect to an unavailable address. Because of this, the entire ZTNET did not work, everything was very slow. When I removed webhook, everything worked fine on version 0.6.5. On version 0.6.6 the problem persists.

n9yty commented 1 week ago

Interesting . . . I pulled your latest changes into my local copy, but I also did a docker system prune -af to free up space and remove all old versions of things. Then I did a rebuild. Now I am having quite substantial delays waiting for the network information to populate. It comes in sections [i.e. the Network Settings, Network Members, Network Actions, etc], with delays between each. This makes me wonder if some component of some piece in one of the images is causing the issue, but I am was not seeing it while still having the older cached version but now that it is built from nothing it is showing up? Just a thought.

cscompton commented 1 week ago

looking at the browser development tools, it looks like all the pending requests or request that take a long time (15+ seconds to complete are all websocket request to /api/websocket with request/response with the greatest duration is "waiting for server response". websocket switch (https status 101) is always pending but I contribute that to the open websocket.

sinamics commented 1 week ago

looks like all the pending requests or request that take a long time (15+ seconds to complete are all websocket request to /api/websocket with request/response with the greatest duration is "waiting for server response". websocket switch (https status 101) is always pending but I contribute that to the open websocket.

The 101 pending is normal as websocket is switching from polling to websocket protocol. Can you check the /api/trpc/network.getNetworkById request, does this take long time to load?

It would be very helpsome if i could login to someones applications and have a look, as i have no clue why this happing at the moment, but i fully understand that would be difficult.

What i need to know:

sinamics commented 1 week ago

Can you test this image tag dev-489b8e1. See if the problem persist, and if so check the docker logs ztnet. I have added a load network time to the log output.

n9yty commented 1 week ago

PM me or email me - if you can send me an ssh key I will give you access to my hosted note running it as well as set you up with a temporary admin login in ztnet itself. It is a slow small node, but you can build the image on it - it just takes a long time. But you could build, tag and push the image and update it. If you are interested let me know.

I have two organizations, no "local controller" networks. Each organization has just two networks, the biggest network only has about 35 nodes in it, the others are just a handful.

I pulled the tagged image you mentioned and opened the network, this is what I have:

Executing command β–² Next.js 14.1.4

sinamics commented 1 week ago

Thank you @n9yty. Could you try this image tag and post the logs dev-d4e9021. I`ve added more time meassures to the network fetch. If that does not reveal anything, i will send you my pub ssh key.

n9yty commented 1 week ago

Here is a round of timings from that tag:

Database seeded successfully! Executing command β–² Next.js 14.1.4

sinamics commented 1 week ago

Thank you. Now i know were to look. Will do some investigation tomorrow and post an update.

sinamics commented 1 week ago

I think the issue should be resolved now. Could you guys test docker image with tag dev-89afd30. The issue appears to have been related to how the peers were fetched from the controller.

n9yty commented 1 week ago

@sinamics This does seem to resolve it. Thank you for your persistence, I know how difficult it can be tracking down an issue you can't reproduce.