Open cuducos opened 6 years ago
I'm implementing this on: https://github.com/turicas/serenata-toolbox/tree/feature/dataset-reader
All the datasets in Brasil.IO will use the datapackage specification (for more info, see this milestone) and I think it could be the default way to access data in Serenata also (there are libraries to deal with it automatically so we don't need to create converters, just the datapackage spec). What do you think?
What is the problem?
Dealing with the CSV generated by the toolbox is not trivial: before
pd.read_csv
we need to define a lot ofdtype
, in Jarbas we spent a bunch of lines of code deserializing data (converting strings to date objects, to integers and floats).How can this be addressed?
@turicas and I talked today and he suggested that the toolbox could offer an API not only to generate a CSV version of our datasets, but also a high level iterator for them. Something like:
And the output would be an object with proper types (
int
,Decimal
,date
etc.).Who could help with this issue? @turicas ; )