pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.9k stars 18.03k forks source link

Expanded `df.apply` returns empty `DataFrame` if `df` has no columns #22102

Open madman-bob opened 6 years ago

madman-bob commented 6 years ago

Code Sample, a copy-pastable example if possible

>>> import pandas as pd
>>> df = pd.DataFrame([[],[],[]])
>>> df.apply(lambda x: [1, 2, 3], axis=1, result_type='expand')
Empty DataFrame
Columns: []
Index: [0, 1, 2]

Problem description

Applying with a function that doesn't depend on the columns of the input should always give the same output.

Expected Output

>>> df.apply(lambda x: [1, 2, 3], axis=1, result_type='expand')
   0  1  2
0  1  2  3
1  1  2  3
2  1  2  3

Output of pd.show_versions()

``` INSTALLED VERSIONS ------------------ commit: None python: 3.6.2.final.0 python-bits: 32 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.23.3 pytest: None pip: 18.0 setuptools: 28.8.0 Cython: None numpy: 1.14.5 scipy: 1.1.0 pyarrow: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.7.3 pytz: 2018.4 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None sqlalchemy: None pymysql: None psycopg2: None jinja2: None s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None ```

Workaround

For anyone else experiencing this issue, the following can be used as a temporary workaround:

>>> from pandas.core.apply import frame_apply
>>> frame_apply(df, lambda x: [1, 2, 3], axis=1, result_type='expand').apply_standard()
   0  1  2
0  1  2  3
1  1  2  3
2  1  2  3
WillAyd commented 6 years ago

Why would you need this? You'd be much better off just assigning those scalar values as individual columns - this seems rather hacky

madman-bob commented 6 years ago

I've got a DataFrame and a collection of functions expecting Series of various lengths, each returning a Series, of potentially different length. I want to chunk the DataFrame into various columns, apply the appropriate functions, and then rejoin the DataFrame.

For the most part this works fine, but it gets confused if the function expects a Series of length 0.

That said, even without the above reason, it seems odd to me to treat the trivial case differently to all other cases. It's just bound to cause weird behaviour in edge cases.

madman-bob commented 6 years ago

The function is applied to every row, and is passed the columns.