paddymul / buckaroo

Buckaroo - the data wrangling assistant for pandas. Quickly explore dataframes, and run pandas commands via a GUI. Works inside the jupyter notebook.
https://buckaroo-data.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
169 stars 7 forks source link

Lightweight JS dataframe library #75

Open paddymul opened 8 months ago

paddymul commented 8 months ago

I would like to have better frontend dataframe manipulation for simple tasks.

  1. I want a standard API that sends data to ag-grid, abstracting over serialization format.
  2. An API similar to https://github.com/data-apis/dataframe-api (it's a good baseline)
  3. Allow some type of syntax for combining columns into arrays so you can have [cleanedVal, origVal, annotation] from raw columns, the combination done in JS. https://github.com/paddymul/buckaroo/issues/74
  4. Very advanced function, lazy loading of additonal data
  5. Filtering

I want to research the existing js dataframe like libraries. I don't think they offer this functionality, but I want to check.

Once the API is decided on and implemented, this library will enable performance increases through better serialization.

paddymul commented 8 months ago

For notes, I'm adding my thoughts on this that I sent to @MarcoGorelli


Looking through the dataframe standard, the next time I improve on the JS deserialization of dataframes, I'm going to implement the dataframe_standard in JS.

This will allow me to decouple the application side from the serialization side. Then I can change the core serialization. I'm specifically thinking of higher performance serializations based on TypedArrays and base64. I notice that the standard doesn't have a "get_row_by_id" or "get_as_list_of_dicts". I understand why those aren't important for numeric python/C programming, but they are very common in JS.

FWIW I have written multiple versions of Dataframe to JSON to JS serialization over the past decade. All jank, and slow. I want to build one properly, also the python side.

No promises on when I will build this, but I will use your code as the basis for it next time I do.