qri-io / qri

you're invited to a data party!
https://qri.io
GNU General Public License v3.0
1.11k stars 66 forks source link

Diff doesn't work with csv under FSI due to different parsing methods #883

Closed dustmop closed 4 years ago

dustmop commented 5 years ago

With a saved version for an FSI linked dataset, using a csv body, and a clean working directory, running qri status gives:

for linked dataset [<dsref>]

working directory clean

But qri diff gives:

for linked dataset [<dsref>]

0 elements. 0 inserts. 0 deletes. 2 updates.

body:
  0:
    ~ 2: "3"
  1:
    ~ 2: "6"

The csv body in this case is:

one,two,3
four,five,6

The root of this issue is that status is using one method to read bodies: mkDec(f).Decode(cmp); (see here) while diff is using two different methods, the first is r.Get(getp, res) where r is DatasetRequests (see here) while the second is using data, err = allCSVRows(file) (see here) which uses the go package "enconding/csv". Get returns the first row as "one","two",3.0 since it is forcing the json format to read the body, while the "encoding/csv" packages returns the first row as "one","two","3".

Ideally, we'd prefer that lib/diff would read these bodies as EntryReaders, but deepdiff doesn't support passing EntryReaders as the sources to diff. What would be nice is if deepdiff could add it's own lazy iterator interface that would match or adapt what EntryReader is doing. For now, we can just slurp both bodies into arrays and then pass them into deepdiff. As long as this slurping uses the same schema, both bodies should produce an empty diff.

dustmop commented 4 years ago

This was fixed forever ago.