Closed adamsar closed 1 month ago
Are you interested in join
or a simple merge? You can merge
two or more data frames suppose that rows are in the same order with existing API.
More of a join
. I've got a lot of dataframes, including some I receive from other departments, and it's sometimes painful to get these into a cohesive, single dataframe that contains the feature set I need.
As an edit: This functionality is exactly what I'd like https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.join.html
I've got a few different dataframes that I'd like to merge when doing calculating some regression, and right now I do so by converting to a matrix of doubles, aligning the rows by id, and then rebuilding a dataframe. In spark and pandas, they have utility methods that allow you to merge dataframes with a
by
option to specify which column is used to match the data.Describe the solution you'd like Extend the merge method with either a simple
by
option to specific key to merge on, add amergeWith
method, or aMergeOptions
parameter that contains information such asby
(key to join on), andmergeType
(inner vs outerjoins, left vs right join).https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/Dataset.html