pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
27.88k stars 1.71k forks source link

Fill_null() does not cast ALL Null values to other dtype(Nested Dataclass) #17268

Open starzar opened 1 week ago

starzar commented 1 week ago

Checks

Reproducible example

cot_df_total =  pl.DataFrame()
cot_df_cur = pl.DataFrame(data).fill_null("zero")
        print("cot_df_cur.schema")
        print(cot_df_cur.schema)

        cot_df_total = pl.concat([cot_df_total,cot_df_cur],how="vertical")
        ticker_counter += 1

    return cot_df_total.with_row_index()

Log output

C:\Users\User_0\AppData\Local\Programs\Python\Python312\python.exe C:\Users\User_0\Documents\Code\Python\NseScraping\Others\CotReport\cotParse1.py 
cot_df_cur.schema
OrderedDict({'Ticker': String, 'CotReport_data': Struct({'Dealer': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'AssetManager': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'LeveragedFunds': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'OtherReportables': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'NonReportables': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Null, 'short': Null, 'spread': Null})})})})
cot_df_cur.schema
OrderedDict({'Ticker': String, 'CotReport_data': Struct({'Dealer': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'AssetManager': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'LeveragedFunds': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'OtherReportables': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Null, 'short': Null, 'spread': Null})}), 'NonReportables': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Null, 'short': Null, 'spread': Null})})})})
cot_df_cur.schema
OrderedDict({'Ticker': String, 'CotReport_data': Struct({'Dealer': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'AssetManager': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'LeveragedFunds': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'OtherReportables': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Null, 'spread': Null})}), 'NonReportables': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Null, 'short': Null, 'spread': Null})})})})
cot_df_cur.schema
OrderedDict({'Ticker': String, 'CotReport_data': Struct({'Dealer': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'AssetManager': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'LeveragedFunds': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'OtherReportables': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'NonReportables': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Null, 'short': Null, 'spread': Null})})})})
cot_df_cur.schema
OrderedDict({'Ticker': String, 'CotReport_data': Struct({'Dealer': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'AssetManager': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'LeveragedFunds': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'OtherReportables': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'NonReportables': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Null, 'spread': Null})})})})
Traceback (most recent call last):
  File "C:\Users\User_0\Documents\Code\Python\NseScraping\Others\CotReport\cotParse1.py", line 300, in <module>
    cot_to_html()
  File "C:\Users\User_0\Documents\Code\Python\NseScraping\Others\CotReport\cotParse1.py", line 295, in cot_to_html
    cotTotal_df = txt_to_df(filepath)
                  ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User_0\Documents\Code\Python\NseScraping\Others\CotReport\cotParse1.py", line 184, in txt_to_df
    cot_df_total = pl.concat([cot_df_total,cot_df_cur],how="vertical")
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User_0\AppData\Local\Programs\Python\Python312\Lib\site-packages\polars\functions\eager.py", line 184, in concat
    out = wrap_df(plr.concat_df(elems))
                  ^^^^^^^^^^^^^^^^^^^^
polars.exceptions.SchemaError: type Int64 is incompatible with expected type Null

Process finished with exit code 1

Issue description

https://drive.google.com/file/d/1gaLFuy6QyQNNE32eLFTz-HuAeX974wvw/view?usp=sharing

Fill_null() does not cast ALL Null values to other dtype(Nested Dataclass). Unnesting and casting null to other dtypes results in loss of "key" column names as the value columns.

Any way to get a complete fill_null() on nested dataframes without unnesting?

Expected behavior

All values should be filled with "zero" for pl.DataFrame(data).fill_null("zero")

Installed versions

``` polars 0.20.31 ```
cmdlineluser commented 1 week ago

Can you provide data to make your example reproducible?

As for the particular error:

polars.exceptions.SchemaError: type Int64 is incompatible with expected type Null

The vertical_relaxed strategy may be of help:

cot_df_total = pl.concat([cot_df_total,cot_df_cur],how="vertical_relaxed")