h2oai / datatable

A Python package for manipulating 2-dimensional tabular data structures
https://datatable.readthedocs.io
Mozilla Public License 2.0
1.81k stars 155 forks source link

Consistency between `dt.cbind()` and `Frame.cbind()` #3408

Open oleksiyskononenko opened 1 year ago

oleksiyskononenko commented 1 year ago

Currently, dt.cbind() returns a new frame by cbinding all the input frames. At the same time Frame.cbind() method appends frames to the original one.

This behavior could be quite confusing for users, because they may expect Frame.cbind() method to return a new frame also, when we return None. While changing API may not be a good idea from the compatibility point of view, we could think about at least issuing a warning when Frame.cbind() is called on views.

For example, the following code doesn't modify the original DT frame, while some users may think it does

>>> from datatable import dt
>>> DT = dt.Frame([1,2,3])
>>> DT[:, :].cbind(DT)
DatatableWarning: Duplicate column name found, and was assigned a unique name: 'C0' -> 'C1'
>>> DT
   |    C0
   | int32
-- + -----
 0 |     1
 1 |     2
 2 |     3
[3 rows x 1 column]

At the same time, this code works perfectly fine

>>> DT.cbind(DT)
>>> DT
   |    C0     C1
   | int32  int32
-- + -----  -----
 0 |     1      1
 1 |     2      2
 2 |     3      3
[3 rows x 2 columns]

Since DT[:, :] and DT essentially represent the same thing, it could be a hard time for users understanding the code above.

It seems that even in our documentation we kind of confusing in-place and out-of-place functionality when say

This function is exactly equivalent to: dt.Frame().cbind(*frames, force=force)

Because in reality dt.cbind() produces a new frame and it is not equivalent to dt.Frame().cbind() that cbinds everything to an empty frame and returns None.