h2oai / datatable

A Python package for manipulating 2-dimensional tabular data structures
https://datatable.readthedocs.io
Mozilla Public License 2.0
1.82k stars 157 forks source link

Created zero-column frame has wrong number of rows #3428

Closed hallmeier closed 1 year ago

hallmeier commented 1 year ago
from datatable import dt

dt.Frame([()])
#    |
#    |
# -- +
# [0 rows x 0 columns]
dt.Frame([{}])
#    |
#    |
# -- +
# [0 rows x 0 columns]

Expected behavior:

dt.Frame([()])
#    |
#    |
# -- +
#  0 |
# [1 row x 0 columns]
dt.Frame([{}])
#    |
#    |
# -- +
#  0 |
# [1 row x 0 columns]
oleksiyskononenko commented 1 year ago

Well, you pass 1 column and 0 rows to dt.Frame(), so I'm not sure why you expect datatable to create 0 columns and 1 row. Note, that datatable frame is a column oriented container of data: https://datatable.readthedocs.io/en/latest/api/frame.html

The result is still wrong, but I would say that the expected behavior in the both cases should be

   |   C0
   | void
-- + ----
[0 rows x 1 column]

At least, this is what happens for the empty list and seems reasonable

>>> dt.Frame([[]])
   |   C0
   | void
-- + ----
[0 rows x 1 column]
hallmeier commented 1 year ago

When creating a frame from a list of lists, each list in the list marks a column, so this behavior is correct. But when creating a frame from a list of tuples or a list of dicts, each tuple/dict marks a row, so I am passing 0 columns and 1 row. See these examples:

>>> dt.Frame([(0,), (0,)])
   |   C0
   | int8
-- + ----
 0 |    0
 1 |    0
[2 rows x 1 column]
>>> dt.Frame([{"A": 0}, {"A": 0}])
   |    A
   | int8
-- + ----
 0 |    0
 1 |    0
[2 rows x 1 column]

>>> dt.Frame([0])[:, f[[]]].to_tuples()
[()]
oleksiyskononenko commented 1 year ago

You are right, but at the same time in docs we say

When the source is a non-empty list containing other lists or compound objects, then each item will be interpreted as a column initializer, and the resulting frame will have as many columns as the number of items in the list.

My feeling is that we need to review this part of functionality/docs to make it consistent. And obviously fixing the bug you have discovered.