Closed spoonerf closed 4 days ago
This is because the first thing that format
does is underscoring the column names (and changing some symbols, e.g. ' ' I believe gets converted into underscore).
So in your example, you need to do tb.format(["country_or_area")
. Let me know if it works!
@Marigold I wonder if we should change this, so that underscoring is done last? I fear this could affect other steps though.
I don't have an opinion. Both have pros and cons.
(If we wanted to change it, we'd create a PR, increment ETL_EPOCH
, use data-diff
to find problematic datasets, revert ETL_EPOCH
and merge. It's a bit tedious.)
I'd say let's leave format
with the current behaviour, so this error is expected since the column names are no longer valid because of underscoring.
@spoonerf I'm adding a better error message to help users with this error
Very minor issue, but I've encountered it a few times now so thought it was worth reporting.
Sometimes when using
tb.format(["column_x"])
, it can't findcolumn_x
so I have to usetb.set_index("column_x").sort_index()
instead, which works fine.An example of this issue can be found here
The error shown is: