Closed philipce closed 6 years ago
I didn't 100% understand the different formatting you describe, but here...
We loose the readability of a nice csv... but I don't think a 3+ dimensional tensor is readable anyway... plus, the description computed property is readable anyway.
I completely agree with you. I think you are spot on π
Also regarding files, there is a new protocol that allows for nice and easy JSON serialization ("Codable"). When we do CSV, we could also implement that one. I've heard it's specially nice for types made out of stdlib structures and primitive types like Int and String... so it shouldn't take much time πππ»
Yeah, it wasn't very clear. It was me hastily capturing a thought before it was gone :) It'll be easy to add; I'll probably get to it after I finish this data series/frame stuff.
That sounds like a good idea! Thanks for pointing the new protocol out, I hadn't seen it yet.
Should we ditch csv and use JSON? Or should we keep it? I donβt usually deal with datasets, so Iβm clueless in this regard.
Hm interesting. I hadn't thought about adding JSON but that's probably a good idea. We'll definitely want to keep CSV though. From a numerical computing standpoint, json isn't used really at all and csv is pretty standard
Migrated issue to pivotal tracker
Not a pressing issue but I was just thinking that my original thoughts on how to save these data structures to file are a little off. Instead of saving values to file such that shape can be inferred (e.g. columns are separated by commas, rows by newlines, pages by semicolons, etc.), we should just be explicit about the shape, then store everything in row major order. So some format like:
tensorName:3,4,6,2:0.456,234.4,234,567,143, ...
We can have the name (and any other meta data), the shape, and then the comma separated values can easily be shoved into the tensor's array for easy creation.
Then, perhaps each object is separated by semi colons. So the whole file may encode a bunch of variables. Then importing a file would put all the variables in a dict, for easy access.
We loose the readability of a nice csv... but I don't think a 3+ dimensional tensor is readable anyway... plus, the description computed property is readable anyway.