rebeccajohnson88 / PPOL564_slides_activities

Repo for Georgetown McCourt's School of Public Policy's Data Science I (PPOL 564)
Creative Commons Zero v1.0 Universal
9 stars 13 forks source link

3 - Part A and Part B - Optional Extra Credit 1 #45

Closed sonali-sr closed 1 year ago

sonali-sr commented 1 year ago

3. Optional extra credit 1: regex to separate companies from individuals (1 point)

You notice some employers in debar_clean have both the name of the company and the name of individual, e.g.:

COUNTY FAIR FARM (COMPANY) AND ANDREW WILLIAMSON (INDIVIDUAL)*

Use the uppercase/cleaned name_clean in debar_clean

A. Write a regex pattern that does the following: - Captures the pattern that occurs before COMPANY if (COMPANY) is in string; so in example above, extracts COUNTY FAIR FARM - Captures the pattern that occurs before INDIVIDUAL if (INDIVIDUAL) is also in string -- so in above, extracts ANDREW WILLIAMSON (so omit the "and")

B. Test the pattern on pos_example and neg_example-- make sure former returns a list (if using find.all) or match object (if using re.search) with the company name and individual name separated out; make sure latter returns empty

Hints and resources: for step A, you can either use re.search, re.match, or re.findall; don't worry about matching B&R Harvesting and Paul Cruz (Individual)

qinip commented 1 year ago

Hi Professor, the question asks for the pattern "returns a list if using re.findall." My result is a list containing one tuple, in the form of [(co_name, ind_name)], rather than a flattened list [co_name, ind_name]. Is this OK? Thanks!

rebeccajohnson88 commented 1 year ago

Hi Professor, the question asks for the pattern "returns a list if using re.findall." My result is a list containing one tuple, in the form of [(co_name, ind_name)], rather than a flattened list [co_name, ind_name]. Is this OK? Thanks!

yep that's fine