move-coop / parsons

A python library of connectors for the progressive community.
Other
255 stars 125 forks source link

[Feature/Addition] parsons.databases.database.DatabaseCreateStatement.detect_data_type() should detect more data types than str, float, int #942

Open austinweisgrau opened 7 months ago

austinweisgrau commented 7 months ago

parsons.databases.database.DatabaseCreateStatement.detect_data_type() is only able to currently detect data types str, float, and int. As a result, downstream methods (e.g. parsons.databases.redshift.rs_create_table.RedshiftCreateTable.data_type()) that generate DDL statements to create database tables are only able to infer data types for database tables for these 3 types.

It should be possible to infer types for more data types, including datetimes, dates, and booleans.

Detailed Description

Context

When using parsons.Redshift.copy() to upload a parsons Table into a Redshift database, parsons should be able to infer data types for most column types and pass those into Redshift accurately. However, the current implementation only accomplishes this for string, float, and integer types. All other types are coerced into strings. There should be enough information available in a parsons table to detect and accurately represent other types, including datetimes, dates, and booleans.

Possible Implementation

postgres, mysql, and redshift database connectors all use the underlying detect_data_type() method, so whatever change is made here will need to be compatible with all 3 downstream connectors and SQL generators.

Priority

Jason94 commented 7 months ago

Yes, I can confirm this is really annoying for boolean columns. I have to use bool_col = 'True' as a workaround.

austinweisgrau commented 7 months ago

Figured out that parsing BOOLs works if parsons.databases.database.constants.DO_PARSE_BOOLS is set to True. There doesn't seem to be any documentation about why this defaults to False or anything.

austinweisgrau commented 7 months ago

The feature was added 2 years ago with 766cfaedc5652af8c39fde890067a99b6fa58518 and it seems to have been turned off by default for "reverse compatibility," which doesn't really make sense to me 🤔

Jason94 commented 7 months ago

Oh dear, they couldn't have even made that an argument to the function or something? This would definitely be a breaking change, but a really good one... Maybe we can target flipping the default for the next major release? (I know Shauna is cutting one right now.)

austinweisgrau commented 7 months ago

Ya I made the PR #943 on the major-release branch

Jason94 commented 6 months ago

Are we ready to close this issue, @austinweisgrau ?

austinweisgrau commented 6 months ago

I think it would still be nice to have support for loading datetime and date type objects to databases where possible!