qri-io / qri

you're invited to a data party!
https://qri.io
GNU General Public License v3.0
1.1k stars 66 forks source link

bug(apply): need to calculate `ds.Structure.Entries` for the applied dataset #1907

Closed ramfox closed 2 years ago

ramfox commented 2 years ago

Fixes qri-io/frontend#254

First, this will add some time to the amount of time it takes for a dryrun to occur, since we would need to iterate over the entire dataset to learn the proper row count. However, my personal opinion is this is one of the main things a user would be looking to when doing a dry run as a sanity check.

To fix this, we need a very paired down equivalent of the dsfs.calculateComputedFields method, calculateEntries that uses a dsio.EntryReader to iterate over the rows of the dataset & track the row count.

We can call this in the transform.Apply function, after we have done the actual transform apply because we split Apply into Apply and Commit.

Commit is used exclusively in the "Save" path (which does it's own computed fields calculation), so keeping this work inside Apply means we will never "double calculate" the number of entries.

dustmop commented 2 years ago

I think that @b5's suggestion that DataFrames may help with this is actually correct. If we're talking about only needing the results of a dry run transform, then it should be doable. The function AssignBodyFromDataframe in transform/startf/ds/dataset.go is already iterating the DataFrame's rows in order to build the bodyFile, using df.NumRows() to know how many entries will be in the body file. This value could be saved to the structure.NumEntries, or returned to the caller, which ever is preferred. Two concerns: 1) If a transform does not modify the body, but only changes the meta or whatever, then AssignBodyFromDataframe returns early. It that case the NumEntires should match the previous version of the database. 2) It could become confusing if only NumEntries is assigned, but some code is written assuming the entire structure has been filled in. I think this could be solved by properly commenting and documenting how this works, but it's something to keep an eye on.

ramfox commented 2 years ago

Great! That's perfect. Thank you!!!

ramfox commented 2 years ago

but some code is written assuming the entire structure has been filled in.

Okay will do a dive on this, looking to see what happens if we don't change the body when we do a transform & what the expectations are for Structure