Open jurgispods opened 4 years ago
ret = conn.cursor(PandasCursor).execute("select * from (values (1), (NULL))").fetchall() ret [('bla',), ('',)] [pd.isna(x[0]) for x in ret] [False, False]
It seems that the query in this example is wrong. In this case, what kind of query are you executing? It may be the same issue as #118.
@laughingman7743 Apologies, there was a copy-paste error in my second query. I've edited the original post.
And regarding #118: I understand that I can use na_values=''
to force converting the respective date in the Athena-generated CSV to NaN. But this means losing the possibility to distinguish between NULL
and the empty String in the original Athena table.
I guess this cannot be circumvented, as the information is already lost in the Athena-generated CSV?
If you check the CSV file output from Athena, you'll see that empty characters are double-quoted, while null characters are not.
"_col0"
"blah"
""
If you can handle them well, you should be able to identify null and empty characters. But I don't know how to configure the read_csv method's options to better read this CSV.
I see. One approach would be to disable quoting and remove the quotes afterwards. With your example in mind:
import pandas as pd
import csv
df = pd.read_csv('myfile.csv', skip_blank_lines=False, quoting=csv.QUOTE_NONE)
# remove quotes from String columns manually
df.select_dtypes([object]).apply(lambda col: col.str[1:-1])
This is not very elegant, but if CSVs written by Athena are guaranteed to contain quotes Strings, this should always work. What do you think?
In #117, a fix was made for NULL results not being returned by PandasCursor. This now work as expected:
However, NULL values for String columns are secretly converted to empty Strings:
Is this the expected behaviour? I believe
NULL
should always be converted to NaN, regardless ofna_values
orkeep_default_na
.