Closed Cokesalokes closed 1 year ago
Thanks for the request. Can you provide a small example of input / output.
But if there are more rows in DF2, that will make the number of rows in DF_result explode; whereas it should have the same number of rows as DF1.
This isn't quite true - the result could be empty if there are no matches regardless of the number of rows in each DataFrame. So it's not quite clear what you're asking for. In particular, when there are multiple matches for one row in the left DataFrame (DF1), what should pandas do?
I would just drop duplicates beforehand if you only want to match with the first entry on the right side. You could also use isin to do a filter, we need a reproducible example to help here
I cannot seem to reproduce the issue on small scale, but I can show the in/out (obviously cannot share confidential data)
Yeah your merge column has duplicates, you should remove them if you don't want them. Closing, please ping to reopen if you can provide a reproducer and drop_duplicates is not sufficient
Yeah, apparently somewhere in lines 1,050 something.....Apologies
facepalm emogi
Feature Type
[X] Adding new functionality to pandas
[X] Changing existing functionality in pandas
[ ] Removing existing functionality in pandas
Problem Description
I need to join 2 or more DFs (DF1 left, and DF2 right), but I only need to pull some of the information from DF2 whose column (let's say col_3) matches a column in DF1 (let's say col_B).
What we have now is to use a join or merge DF_result= pd.merge(left=DF1, right=DF2, how='left', left_on='col_B', right_on='col_3')
But if there are more rows in DF2, that will make the number of rows in DF_result explode; whereas it should have the same number of rows as DF1.
This is the vlookup functionality from excel.
Feature Description
If I knew how to do this, I would.
Alternative Solutions
put everything into excel, and do a vlookup, but that only works if you have less than 200,000 rows, I am using CSVs with almost amillion....so excel keeps breaking.
Additional Context
Thanks for all your help, and sorry if this is already in the works. I know a lot of people have been looking for workarounds and such, but such a great feature for many.