okfn / opendataeditor

No-code application to explore and publish all kinds of data: datasets, tables, charts, maps, stories, and more. Forever free and open source project powered by open standards and generative AI.
http://opendataeditor.okfn.org
MIT License
150 stars 18 forks source link

Search engine for tables - Research #324

Open romicolman opened 2 months ago

romicolman commented 2 months ago

Problem description

Let's say you have a large table and you want to check if a specific entity is listed there. Right now there is no way to do it. The user will have to scroll down and read all elements within the table.

Captura de pantalla 2024-04-12 a la(s) 17 42 07

Steps to reproduce it

Suggested solution

Add search engine.

romicolman commented 2 months ago

More context on this: the search bar should be located at the top of the table (upper right corner).

See Material UI option here. Section: App bar with search field

Let me know if you need further info.

pdelboca commented 2 weeks ago

Here is an implementation example of Search: https://reactdatagrid.io/docs/miscellaneous#csv-export-+-custom-search-box

pdelboca commented 2 weeks ago

So I have been playing around with searching.

Here are some conclusions:

  1. Our datagrid uses remote pagination which makes sense if we want to be able to work with files with millon of rows.
  2. Given point number one, we are forced to use a backend search over more simple front end solutions like the one I mentioned in the previous comment.
  3. ReactDataGrid re-renders everytime its dataSource change.
  4. Our dataSource is tightly coupled with the history and save features so, our search feature should not affect it. (Or a user that hit Save then a search is applied will save only the filtered data.)

Some implementation notes:

  1. The dataSource logic is implemented in the loader function of the Table's store
  2. The call to client.tableRead is the one responsible to fetch data from the backend and handle the offset, limit parameter for pagination

Possible solution:

  1. I'm leaning for a Full-Text search feature of SQLite: https://www.sqlitetutorial.net/sqlite-full-text-search/. @roll the indexer seems to be at the core of frictionless-py have you ever explored this option?
roll commented 1 week ago

@pdelboca I agree that FTS would make sense but in frctioness-py indexing mean e.g. CSV -> database with validation etc so it's not related to full-text-search

romicolman commented 1 day ago

@pdelboca I want to add something here that I discussed with @guergana and @roll this week, in case it is relevant for the implementation of the search engine:

Currently, when opening a tabular file, the datagrid shows a certain number of rows (5,10,20,25,40,50,100) and the user has to click on the icon at the bottom (pagination) of the screen to keep checking the rest of the table:

Image

This way of exploring data is problematic, especially if you have a tabular file with a large number of rows. I checked how Flourish and Datawrapper show to the user tables and both tools allow to scroll data keeping the column headings immobilized. I'll create a separate issue to remove pagination, but I wanted to mention this in case this is something you need to consider when implementing the search engine.