move-coop / parsons

A python library of connectors for the progressive community.
Other
255 stars 125 forks source link

Parse Boolean types by default #943

Closed austinweisgrau closed 7 months ago

austinweisgrau commented 7 months ago

Commit 766cfaedc5652af8c39fde890067a99b6fa58518 created a feature for parsing boolean types but turned it off by default. This commit turns that feature on by default and adds a comment about how to turn it off and what that does.

When this feature is turned on, parsons will detect boolean column types when copying a parsons Table to a database and create a boolean column in the database. The default behavior without this change is to coerce booleans to be strings when loading to database.

Partially addresses #942

austinweisgrau commented 7 months ago

This will change default behavior for loading booleans from parsons Table to a database, so implementations based on the prior default behavior will likely break. e.g.

tbl = Table([{"bool_col": True}])
Redshift.copy(tbl, "tablename")
Redshift().query("select * from tablename where bool_col = 'True'")

will no longer work, instead the more normal code will be needed:

Redshift().query("select * from tablename where bool_col ")
Jason94 commented 7 months ago

I think this is a great change. 100% for making things more straightforward and intuitive. A few questions, Austin.

  1. Will this affect database adapters besides Redshift? It looks like it will, based on the filepath.
  2. This global variable mechanism is bad practice. The usual way would be to configure this based on parameters from the call sites. (For example, in the Redshift.copy method.) And it's very unlikely to be discovered, because it's not documented in any of the places where a user would actually be uploading data. What do you think about just removing it entirely? If someone still wants to support the old behavior, they can just convert their boolean columns in the table:
table = table.convert(['bool', 'columns', 'here', ...], str)

We could include that code snippet in the release notes to help anyone migrate who wants to retain the old behavior.

austinweisgrau commented 7 months ago

Yeah I agree, I guess I was also being cautious about maintaining "backwards compatibility" with something I don't really understand, but 1) I agree this whole feature set and configuration don't make any sense and aren't necessary 2) since this is going into a major release, it's ok to break some minor level of compatibility

So yeah I'll make some updates to just turn this on and remove the configurabilith

austinweisgrau commented 7 months ago

Ok so digging deeper, it looks like the current implementation will accept any of the following case-insensitive python values as True: [True, "true", "t", 1, "1", "yes"] and any of the following as False: [False, "false", "f", 0, "0", "no"]

I think this is bad and the only values we should translate into boolean are python booleans, [True, False]

If someone wants a set of strings or integers to be coerced into booleans when they're loading a parsons Table to a database, they should do that type conversion explicitly themself. It seems sketchy to have strings like "t" or "yEs" getting coerced into booleans by default.

These changes are made with 1cc6bd1e44d855cc9853f954666ffad189db9ff4

What do you think?

IanRFerguson commented 7 months ago

Ok so digging deeper, it looks like the current implementation will accept any of the following case-insensitive python values as True: [True, "true", "t", 1, "1", "yes"] and any of the following as False: [False, "false", "f", 0, "0", "no"]

I think this is bad and the only values we should translate into boolean are python booleans, [True, False]

If someone wants a set of strings or integers to be coerced into booleans when they're loading a parsons Table to a database, they should do that type conversion explicitly themself. It seems sketchy to have strings like "t" or "yEs" getting coerced into booleans by default.

These changes are made with 1cc6bd1

What do you think?

I agree philosophically, we shouldn't be coercing data types implicitly so this is a valuable change. Going to take one more look at the diff but overall looks good