3 - Part C and Part D - Optional Extra Credit 1

rebeccajohnson88 / PPOL564_slides_activities

Repo for Georgetown McCourt's School of Public Policy's Data Science I (PPOL 564)

Creative Commons Zero v1.0 Universal

9 stars 13 forks source link

3 - Part C and Part D - Optional Extra Credit 1 #46

Closed sonali-sr closed 1 year ago

sonali-sr commented 1 year ago

3. Optional extra credit 1: regex to separate companies from individuals

C. Iterate over the name_clean column in debar and use regex to create two new columns in debar:

co_name: A column for company (full name_clean string if no match; pattern before COMPANY if one extracted)
ind_name: A column for individual (full name_clean string if no match; pattern before INDIVIDUAL if one extracted)

D. Print three columns for the rows in debar containing the negative example and positive example described above (county fair farm and cisco produce):

name_clean
co_name
ind_name
Violation

FanniVarhelyi commented 1 year ago

Hi Professor - I am having trouble extracting groups from re.search. I could do it for the example, but for the full list, I cannot extract the group 1 match (company name). The list comprehension returns the match: <re.Match object; span=(0, 61), match='COUNTY FAIR FARM (COMPANY) AND ANDREW WILLIAMSON > . What am I missing?

Screen Shot 2022-10-20 at 9 55 51 PM

brad-wayne commented 1 year ago

Hi Professor - I am having trouble extracting groups from re.search. I could do it for the example, but for the full list, I cannot extract the group 1 match (company name). The list comprehension returns the match: <re.Match object; span=(0, 61), match='COUNTY FAIR FARM (COMPANY) AND ANDREW WILLIAMSON > . What am I missing?

This is just because debar_3c1 is the entire list made with your list comprehension, it's all the elements inside it that are either re.Match type or NoneType.

sonali-sr commented 1 year ago

@FanniVarhelyi : The answer provided by @brad-wayne is correct. So you would have to find a way to access the elements inside the list. You can do a print statement to check if debar_3c1 produces the result you want - just to be sure.