okfn / opendataeditor

No-code application to explore and publish all kinds of data: datasets, tables, charts, maps, stories, and more. Forever free and open source project powered by open standards and generative AI.
http://opendataeditor.okfn.org
MIT License
150 stars 18 forks source link

Improve indexing flow #398

Open pdelboca opened 1 month ago

pdelboca commented 1 month ago

Overview

This ticket is a continuation of #391 and #397. It aims to improve our index workflow to fix some current issues.

User Story

AS a User
WHEN I manually fix errors in the datagrid and save the changes ,
I WANT the report of error to be refreshed to reflect the latest changes.

Technical details

Currently the indexing of files is done at read time (we have only one method fileIndex called upon load). It is probably a good idea to migrate this to a more traditional approach to run the report and store the results when creating/saving the file and then read it when opening the file.

This might be working but it is not quite clear, we should create new actions and methods to explicitly create and get instead of a single fileIndex method that handles multiple scenarios.

Proposal

This way we can have specific actions that can be called independently to better manage the sincronization of file <-> index.

pdelboca commented 1 month ago

@roll @romicolman @guergana let me know your thoughts!

guergana commented 1 month ago

I am with Romina on this... what do we mean by indexing? I also don't get it. Why do the users need to index the files? Indexing is usually a way of optimizing the file for faster searches, I see some validation for errors going on the in the server code. :see_no_evil: what is the original intention of this button? I agree with @pdelboca that this terminology is too technical for end users and if this is table optimization it should be hidden from the users and if we are using this button to validate then we should give it a clearer name.

roll commented 1 month ago

I think we discussed it with Romina that for end-users it needs to be called "Validate" instead of "Index". @guergana, it's a technical term from frictionless-py and it's not for optimizing it's basically the whole process of the file ingestion into the system

romicolman commented 1 month ago

Hi all! A couple of comments from my side:

Ideal workflow

Again, I understand that INDEX and VALIDATE are two different buttons right now. If we cannot make both functions work together, we need to rename INDEX to make it understandable to users. For me, INDEX is a kind of RELOAD/REPROCESS data.

One more thing to add to this discussion. I checked the Data Curator documentation here to check how they addressed this issue, but maybe you see something in the code that is useful.