Closed Orv closed 2 years ago
Good ideas, something needs to be done to give better visibility into the errors (without looking at the server logs).
I think the type of errors will be easier, I think that's stored that in some JSON in the database with the collector statistics, but wasn't added to the UI. As for the node list, I need to finish the reverse DNS lookup. I started on it early on but focused first on getting other features incorporated.
I was mistaken, I wasn't saving the counts where it is readily accessible. But that just means I can implement something that captures the affected nodes so we can get more information.
I refactored the network polling functionality so that it looks up the DNS name for nodes that have problems and returns that information in a more usable structure to the calling function. With that, instead of just storing the name/IP of the nodes with errors in a summary column, I'm going to add a new table for node errors. It will store the node IP, name (if available), error type, and server response.
My initial thought is to add a "failing nodes" section on the overview page with columns for each of the different error types, with each column contains a list of the nodes that failed with that error (in the most recent run). A future enhancement would be a page where past errors can be viewed and the full responses. I'm not sure if there's a good way to display the response on the overview page (although I guess I could always link to a "details" page).
@Orv does that sound like it's along the lines of what you're thinking? I can share a screenshot once I get some of the UI elements put together.
I WOULD like to see a screenshot, thanks.
@Orv here's my first pass at a summary of the node errors on the main page. Just lists the node name/IP grouped by the error type. I plan on adding another page with the details of the response.
Edit: the response details are most useful for HTTP errors and parse errors, the connection and timeout errors typically all look about the same
Can the error counts be broken down into types of errors? If so that would be helpful. Even more helpful would be a list of nodes associated with that particular type of error.