Open mjgolebiewski opened 4 years ago
@mjgolebiewski What would you expect the automatic behavior to be? Do you think random characters should be added? Some other mechanism?
Note: Any columns in the RHS dataframe are going to be propagated to the joined data frame as lists.
if not random characters then maybe some related to joined dataframes names? i am still exploring raster_join
and its outputs so im not sure.
@mjgolebiewski What do you mean by "joined dataframes names"? If you mean the name of the variables referencing them, then there's no way to get that information from within raster_join
. My suspicion is that the behavior is typical Spark behavior, in that you have to take care of renaming columns before joins to keep them unique.
From a pandas user perspective and also experience with R data.frame, I would expect either:
1) All column names are appended by a distinguishing string indicating the side of the join they came from : ('_left', '_right')
or ('_x', '_y')
. These strings may be an argument to the join method
2) Only column names appearing in both DataFrames are disambiguated by appending in such a fashion
Easy to reproduce, just try to
raster_join
3 rasters. On second join error above is shown. Current solution is todf.drop('spatial_index_agg')
before join.