Closed chezou closed 4 years ago
A column type is defined by msgpack type of random sampled rows on bulk import API. While this option provides explicit type conversion from CSV to msgpack, users must use update_schema
for ensuring schema type after bulk importing.
There can be another option like as_is
which respects CsvReader parse with dialect.
PR https://github.com/treasure-data/td-client-python/pull/85 provides a proposed solution, by supporting dtypes
and converters
arguments similar to those used in Pandas when reading CSV. The default behaviour is still the same as it was before.
Resolved by #85
The current implementation for td-client-python’s CSV reader reads all fields as string and then convert type with trying to int() or float().
This logic causes type conversion string with leading zeros like ”00011” to 11. It'd be nice if we could keep numerical values with leading zero as string, so we need to introduce an explicit type option in
BulkImport.upload_file()
function like pandssdtypes
.