ibis-project / ibis

the portable Python dataframe library
https://ibis-project.org
Apache License 2.0
5.24k stars 591 forks source link

bug: sql() operation ignores manual cast of columns data type #9407

Closed pybokeh closed 4 months ago

pybokeh commented 4 months ago

What happened?

My code:

flat_rate = con.read_csv(
    'some_path/flat_rate.csv',
    table_name='flat_rate',
    all_varchar=True
)

flat_rate = flat_rate.mutate(
    MDL_YR = _["MDL_YR"].cast("int32"),
    USD_REMB_PART_AMT=_["USD_REMB_PART_AMT"].cast("FLOAT"),
    USD_REMB_LAB_AMT=_["USD_REMB_LAB_AMT"].cast("FLOAT"),
    USD_REMB_SUBL_AMT=_["USD_REMB_SUBL_AMT"].cast("FLOAT"),
    USD_REMB_HAND_AMT=_["USD_REMB_HAND_AMT"].cast("FLOAT"),
    USD_REMB_FRT_AMT=_["USD_REMB_FRT_AMT"].cast("FLOAT"),
    USD_REMB_TAX_AMT=_["USD_REMB_TAX_AMT"].cast("FLOAT"),
    USD_REMB_RNT_LNR_AMT=_["USD_REMB_RNT_LNR_AMT"].cast("FLOAT"),
    USD_REMB_CLM_AMT=_["USD_REMB_CLM_AMT"].cast("FLOAT"),
    DLR_LAB_RATE_AMT=_["DLR_LAB_RATE_AMT"].cast("FLOAT"),
    LAB_HR_QTY=_["LAB_HR_QTY"].cast("FLOAT"),
    LAB_AMT=_["LAB_AMT"].cast("FLOAT"),
    RO_DATE=_["RO_DATE"].cast("DATE"),
)

flat_rate.schema()
ibis.Schema {
  MDL_YR                int32
  FCTRY_CD              string
  MDL_NM                string
  GRADE_SHORT           string
  DOORS                 string
  ENGINE_SERIES         string
  ENGINE_CYLINDERS      string
  ENGINE_DISPLACEMENT   string
  ENGINE_HP             string
  TRANSMISSION_SERIES   string
  TRANSMISSION          string
  RO_DATE               date
  CITY_NM               string
  STATE_NM              string
  DLR_NM                string
  SHRT_PART_NO          string
  ENRICH_CUST_CNTN_TXT  string
  USD_REMB_PART_AMT     float64
  USD_REMB_LAB_AMT      float64
  USD_REMB_SUBL_AMT     float64
  USD_REMB_HAND_AMT     float64
  USD_REMB_FRT_AMT      float64
  USD_REMB_TAX_AMT      float64
  USD_REMB_RNT_LNR_AMT  float64
  USD_REMB_CLM_AMT      float64
  DLR_LAB_RATE_AMT      float64
  LAB_HR_QTY            float64
  LAB_AMT               float64
  VHCL_DEST_CD          string
  LAB_OPRTN_CD          string
  LAB_OPRTN_DESC        string
}

Then when doing:

(
    flat_rate.sql(
    """
    SELECT
        RO_DATE
        , DLR_LAB_RATE_AMT
    FROM
        flat_rate
    WHERE
        RO_DATE >= '2015-01-01'
    """
    )
)

The output table that renders in jupyter notebook indicate the 2 columns are both of type "string" when I expect RO_DATE column to be of type date and DLR_LAB_RATE_AMT to be of type float64.

What version of ibis are you using?

ibis version 9.0 from pip install.

What backend(s) are you using, if any?

DuckDB

Relevant log output

No response

Code of Conduct

cpcloud commented 4 months ago

@pybokeh Thanks for the issue!

This is expected behavior, because you haven't redefined the flat_rate table that's inside the database, you've only redefined the variable in Python.

.sql() allows you to refer to tables in the database, as well as any previous expression you've called .alias(name) on.

In your case, call the alias method on your redefinition of flat_rate and then use that new name in the SQL string, like so:

flat_rate = flat_rate.mutate(...).alias("typed_flat_rate")
flat_rate.sql("SELECT * FROM typed_flat_rate")

Going to close this as wontfix, again thanks for the report!