Closed alistaire47 closed 2 years ago
Are the test failures waiting for https://github.com/ursacomputing/arrowbench/pull/107 to be merged? Or is something else going on there?
Are the test failures waiting for ursacomputing/arrowbench#107 to be merged? Or is something else going on there?
There's something else; I'm debugging. I think types are getting changed in some places differently (for other datasets like nyctaxi_sample
), and haven't yet figured out why
I think types are getting changed in some places differently (for other datasets like
nyctaxi_sample
), and haven't yet figured out why
oh pretty sure this is because I switched from reading CSVs with pandas to pyarrow and the type inference is different
Ok our timestamp precision consistency is not great; round-tripping we vary between timestamp[ns]
, timestamp[us]
and sometimes timestamp[s]
, even when I try to enforce a schema
@jonkeane This is fixed and ready for review now
This PR adds a schema for the Fannie Mae dataset with improved variable names and types and replaces the sample dataset in parallel with https://github.com/ursacomputing/arrowbench/pull/107. Comments by the schema document where to find more information about the data, should it be needed.