pdelboca commented 6 months ago

Overview

This ticket is a continuation of #391 and #397. It aims to improve our index workflow to fix some current issues.

User Story

AS a User
WHEN I manually fix errors in the datagrid and save the changes ,
I WANT the report of error to be refreshed to reflect the latest changes.

Technical details

Currently the indexing of files is done at read time (we have only one method fileIndex called upon load). It is probably a good idea to migrate this to a more traditional approach to run the report and store the results when creating/saving the file and then read it when opening the file.

This might be working but it is not quite clear, we should create new actions and methods to explicitly create and get instead of a single fileIndex method that handles multiple scenarios.

Proposal

[ ] Separate fileIndex into two methods: setIndex and getIndex.
[ ] Call setIndex when creating the file or saving it.
[ ] Call getIndex when loading the file (and after saving it).
[ ] Rename index? This is quite a technical terms and it's definitely confusing for end users. (This could be done in a separated ticket.)

This way we can have specific actions that can be called independently to better manage the sincronization of file <-> index.

pdelboca commented 6 months ago

@roll @romicolman @guergana let me know your thoughts!

guergana commented 6 months ago

I am with Romina on this... what do we mean by indexing? I also don't get it. Why do the users need to index the files? Indexing is usually a way of optimizing the file for faster searches, I see some validation for errors going on the in the server code. :see_no_evil: what is the original intention of this button? I agree with @pdelboca that this terminology is too technical for end users and if this is table optimization it should be hidden from the users and if we are using this button to validate then we should give it a clearer name.

roll commented 5 months ago

I think we discussed it with Romina that for end-users it needs to be called "Validate" instead of "Index". @guergana, it's a technical term from frictionless-py and it's not for optimizing it's basically the whole process of the file ingestion into the system

romicolman commented 5 months ago

Hi all! A couple of comments from my side:

The name of the errors button (Validate) will probably change. We ran a survey, got ideas for names and we are waiting for the UX consultant to make a final decision. As you know, right now, INDEX and VALIDATE are two different buttons. However, here I suggest the ideal workflow so we can discuss if this is technically possible.

Ideal workflow

The user opens a tabular file with errors in the ODE.
The user edits cells to correct errors. Cells can be edited one by one (error 1, error 2, clicks on SAVE to apply changes ).
As @pdelboca mentioned, validation is now produced ONLY ONCE: when the user opens the file in the ODE. It would be great if the validation could be re-run every time the user clicks on SAVE, to see the remaining errors. If this is not possible:
If validation is non-automatic (after clicking the SAVE button), the user edits cell/cells, clicks on SAVE and the VALIDATE button (name to be confirmed).

Again, I understand that INDEX and VALIDATE are two different buttons right now. If we cannot make both functions work together, we need to rename INDEX to make it understandable to users. For me, INDEX is a kind of RELOAD/REPROCESS data.

One more thing to add to this discussion. I checked the Data Curator documentation here to check how they addressed this issue, but maybe you see something in the code that is useful.

okfn / opendataeditor

Improve indexing flow #398

Overview

User Story

Technical details

Proposal