multiprocessio / datastation

App to easily query, script, and visualize data from every database, file, and API.
https://datastation.multiprocess.io
Other
2.9k stars 112 forks source link

Optionally convert numbers in CSV files #265

Closed fritzgrabo closed 2 years ago

fritzgrabo commented 2 years ago

Resolves #264

Here's what I got so far. I gave this a try from dsq (PR coming soon) and it works well!

$ cat ~/scores.csv
Name,Score
Fritz,90
Rainer,95
Fountainer,100

$ ./dsq --schema --pretty ~/scores.csv
Array of
  Object of
    Name of
      string
    Score of
      string

$ ./dsq --schema --pretty --convert-numbers ~/scores.csv
Array of
  Object of
    Name of
      string
    Score of
      number

At this point, I'm not sure if I should add a convertNumbers flag to the literal and http panels too, or just assume false there. Theoretically, it'd make sense to convert numbers from downloaded CSV files, too?

Let me know what you think. Thanks!

eatonphil commented 2 years ago

Nice work!

At this point, I'm not sure if I should add a convertNumbers flag to the literal and http panels too, or just assume false there. Theoretically, it'd make sense to convert numbers from downloaded CSV files, too?

Good point! The better place to put the flag would be on the ContentType struct https://github.com/multiprocessio/datastation/blob/main/runner/state.go#L67. Just add a new top-level flag to that struct. Then everyone can pass into the transformX functions the right setting.

fritzgrabo commented 2 years ago

[...] to put the flag would be on the ContentType struct

Good call; updated!

fritzgrabo commented 2 years ago

[...] can do the convertNumbersInRecord function inside of recordToMap

I gave this a try, and promptly ran into some typing issues that I resolved using reflect. That felt more heavy-weight than it should have. I assume there's a cleaner way to do this, but my go-fu isn't strong enough yet, haha.

That said, I'm not sure I prefer the changes that this last commit introduces to what we had before: The innocent sounding recordToMap helper function is now also responsible for number conversion, which seems like it's really doing 2 things at once? I felt like the number conversion thing was more related to the transformation task that transformCSV handles.

Let me know what you think. Thanks!

eatonphil commented 2 years ago

That said, I'm not sure I prefer the changes that this last commit introduces to what we had before: The innocent sounding recordToMap helper function is now also responsible for number conversion, which seems like it's really doing 2 things at once? I felt like the number conversion thing was more related to the transformation task that transformCSV handles.

I would agree with you in all normal programming situations but this code is the hot loop. Code that runs here runs NM times as the dataset imported into DataStation grows. So looping over all fields twice could be meaningful especially as not just the number of rows N grows but number of columns M grows too. Maybe 2N*M isn't so bad but just something I'm thinking about.

fritzgrabo commented 2 years ago

This is ready for review now.

Note that there's also https://github.com/multiprocessio/dsq/pull/65, which makes the new feature accessible in dsq by adding a --convert-numbers flag there.

eatonphil commented 2 years ago

Looks good! Merging.

fritzgrabo commented 2 years ago

Woohoo! Thanks!