rebeccajohnson88 / PPOL564_slides_activities

Repo for Georgetown McCourt's School of Public Policy's Data Science I (PPOL 564)
Creative Commons Zero v1.0 Universal
9 stars 13 forks source link

2.4 - Part C #44

Closed sonali-sr closed 1 year ago

sonali-sr commented 1 year ago

2.4 More joins and more cleaning (5 points)

C. Conduct a left join using name_clean_2 as the join column where the left hand dataframe isjobs; right hand dataframe is debar_clean, store the result as a dataframe, and print the rows where the merge indicator indicates the row was found in both dataframe

Note: this manual cleaning process is inefficient and helps motivate the fuzzy matching we'll cover in future week or optional OH; that fuzzy matching would recognize that Slash EV ranch is a highly similar string to slash ev ranch llp and match them without us needing to use regex to make the strings identical

2 4 C