thoughtspile / hippotable

👩🏻‍🔬📊 Lightweight data analysis in your browser
https://thoughtspile.github.io/hippotable/
GNU General Public License v3.0
70 stars 3 forks source link

Nice project :+1: #1

Open leeoniya opened 8 months ago

leeoniya commented 8 months ago

hey @thoughtspile

great work on this!

found this project from your r/javascript post

Hippotable inspired me to assemble a few of my libs into something similar to see if i could get even smaller / faster performance.

the result is https://github.com/leeoniya/uTable, and is about 250 LoC so far, and supports multi-column sorting. filters are not implemented yet, but i plan to use https://github.com/leeoniya/uExpr and https://github.com/leeoniya/uFuzzy for this :)

anyways, happy to exchange/collab on good ideas, and happy almost-new-year :) :tada:

cheers!

thoughtspile commented 8 months ago

Hey @leeoniya — great to hear from you, I'm a big fan of your work on uFuzzy! Always up for a little friendly competition =) Make sure to also check out finos perspective, I've taken quite a few UX ideas from them, but the system design is quite complex and integrated.

Re: virtualization. Nitpick: I actually use bare tanstack-virtual, not tanstack-table, so it's around 14 kB, not 60. Regardless, I'll prob take some cues from your approach and hack something together, doesn't look too hard. I see you update on raw scroll event, won't it cause layout thrashing? If that's the case, a few intersection observers should do the trick.

Re: data engine. Would love to see what you come up with, the combo with uFuzzy should be 🔥. I'm toying with the idea of going in the opposite direction and using wasm build of DuckDB for data backend. That's a 2 MB wasm chunk, but it lets power users analyze data using widely known SQL, should perform really well once loaded, and, (speculation territory) support larger-than-memory datasets via streaming select / parquet files.

There's also a small project-related thing you could help me with — dropped you an email, would be super cool if you could take a look.

Happy new year, now that it's official 🚀

leeoniya commented 8 months ago

Make sure to also check out finos perspective, I've taken quite a few UX ideas from them, but the system design is quite complex and integrated.

just from quick testing, their sorting feels quite slow.

Re: virtualization. Nitpick: I actually use bare tanstack-virtual, not tanstack-table, so it's around 14 kB, not 60.

yeah, i figured it was tree-shaken during build or something, cause the byte math did not work out otherwise :)

Regardless, I'll prob take some cues from your approach and hack something together, doesn't look too hard.

i need to do a better job on the virtualization side. right now i always render 2 screens of rows, this reduces the frequency of dom updates, but i think causes enough re-layout during the [bigger] updates which makes the final experience worse, even if the JS profile is better. i need to see if it's better to just render 110% of the screen rows but with more frequent updates.

I see you update on raw scroll event, won't it cause layout thrashing? If that's the case, a few intersection observers should do the trick.

i've played with using IntersectionObservers before (with lists, not tables), so will probably try that again. my approach also relies on https://developer.mozilla.org/en-US/docs/Web/CSS/overflow-anchor to prevent auto-scroll during dom updates, but this isnt fully available in Safari or iOS safari yet, though i dont see iOS being a big use case for this :rofl:

Re: data engine. Would love to see what you come up with, the combo with uFuzzy should be 🔥.

the main question here is what to do about the sorting. uFuzzy tries to order the most relevant matches first, but this will be different for each column's filter. i can do a global search based on concatenated cell contents and use that order maybe. will need exploration.

I'm toying with the idea of going in the opposite direction and using wasm build of DuckDB for data backend. That's a 2 MB wasm chunk, but it lets power users analyze data using widely known SQL, should perform really well once loaded, and, (speculation territory) support larger-than-memory datasets via streaming select / parquet files.

funny story. we just had an internal hackathon at Grafana and my team did this (data transformations via DuckDB), and we got third place out of 66 teams :)

we explored both the frontend wasm route (the DuckDB wasm is like 3.4MB, but you can probably compile some stuff out, not sure it can get down to 2MB). we also hooked up PRQL, cause SQL can get pretty nasty for complex transforms. PRQL is another 3.5MB of wasm, so we kinda gave up on the frontend route and did most of the stuff in backend/Go. the SQLITE wasm blob is only 800kb, but you'll miss out on all the OLAP stuff from DuckDB. i think there might be a serialization bottleneck in doing the initial data insertion, and then getting the data back out for rendering. at grafana we need to do this on each re-query, so this may not be as bad for single-insert CSV scenario. we only had a week to explore this, so very curious what you end up with :)

There's also a small project-related thing you could help me with — dropped you an email, would be super cool if you could take a look.

:eyes: