Create a table to "pull and display" OpenWRT table of hardware

hunghvu commented 8 months ago

Should be a fun one to do. Essentially, the OpenWRT ToH page is too slow and is usually out of date. We can try implementing a performant and better UX ToH as a homelab side project. OpenWRT wiki content is licensed under CC BY-SA 4.0, so this should be a viable project.

We might start working on the CSV export of ToH:

This is a big ticket.

Note: Next time, consider a big ticket like this as a milestone, not just an issue.

hunghvu commented 8 months ago

Thought flow:

How to have a cron job?
How to download the db dump?
How to extract?
How to read and process CSV?
How to import?
When doing a sync, how to do an update? Full replacement, or selective update?
What is the bandwidth consideration?
How will this affect server load?

hunghvu commented 8 months ago

This is a good thread to go over for ideas: https://forum.openwrt.org/t/improving-the-table-of-hardware-toh/139259

hunghvu commented 8 months ago

How to call payload?

Local API => How to pass data to the worker thread?
fetch call => use local host?

Also, how do we implement exponential backoff for failed jobs within an interval of 24h?

hunghvu commented 8 months ago

When a table has 2000+ rows, what should be a preferable approach to perform an update?

Should we just delete the whole collection and re-import every day? This is the most simple but is a resource-consumption task, and can potentially crash the virtual machine as seen in #79.
Selective update? The current dataset is 2000+ rows and 70+ columns, so >140k cells.
- If we iterate the dataset then only fetch a row per pid, this reduces the request/response size, ensuring no issue to the virtual machine. However, this means we have at least 2000+ requests to make. Is the network overhead negligible?
- If we fetch all the datasets for comparison, the request/response size can be unexpectedly large and potentially crash the VM. However, we reduce the network overhead.

Need to think more about these trade offs.

hunghvu commented 8 months ago

The current implementation hits MongoDB (local test) pretty hard.

hunghvu commented 8 months ago

Actually, it is a bug we did not await the patch request to MongoDB. Hence, a flush of requests hammers the database. When we ran ESlint in a5e6516179e2b32fb658fce336a9ee17508bd6c4, the bug was accidentally discovered. The graph below shows database performance after the we await our requests.

hunghvu commented 7 months ago

For the front end:

[x] Create lab page (empty template)
[x] Create OpenWRT ToH page (empty template)
[x] Create a navigation structure for lab > list of project > OpenWRT ToH
[x] Implement table for ToH
- [x] Togglable columns
- [x] Predefined sorted columns
- [x] Multiple sortable columns
- [x] Filter
- [x] Scrollable
- [x] Frozen header
- [x] Paginator
- [x] Number of rows
- [x] Virtual scroll
- [x] Lazy load
- [x] Striped rows
- [x] Grid lines
[x] Fully link to back-end
[x] Improve UI/UX if needed

For the back end:

[ ] Create a specific collection to contain content for the lab, we will need to think of a way to standardize the schema

hunghvu commented 6 months ago

When doing a filter, the current approach is like this.

Filter
Fetch new data
Show all filtered result

The problem is, that when the new filtered dataset is returned, it resets all choices for filtering because the choices are derived from the dataset. How to maintain the choices?

When the choices are maintained, it means whichever new result is returned from the backend won't get reflected.

Or perhaps, should we avoid filtering on click? We can define the filter set and then apply it afterward. To make a new query, users need to clear the existing filter. This behaves resembles Excel.

Or, we simply deduplicate the generation of filter choices, meaning it always requests for a dedicated dataset per table refresh.

hunghvu commented 6 months ago

Or, should we skip fetching new data, and just need to filter what is on the current page? This does not look like what users will expect. This means filtering is confined to data per page only.

On the other hand, if we have a dedicated request to get options for multi-select, perhaps we can have a dedicated endpoint for this purpose.

hunghvu commented 6 months ago

Done with b7089566fefcd1e75fa66e3f6cf5cb05b32fc688.

hunghvu / hungvu.tech

Create a table to "pull and display" OpenWRT table of hardware #112