rebeccajohnson88 / PPOL564_slides_activities

Repo for Georgetown McCourt's School of Public Policy's Data Science I (PPOL 564)
Creative Commons Zero v1.0 Universal
9 stars 13 forks source link

1.2 C #51

Open sonali-sr opened 1 year ago

sonali-sr commented 1 year ago

1.2 named entity recognition (3 points)

C. You want to extract the possible sentence lengths the CEO is facing; pull out the named entities with

(1) the label DATE and (2) that contain the word year or years. Print these named entities.

Hint: You may want to use the re module for the second part.

1 2C
JiaqinWu commented 1 year ago

After subsetting after using this code, I get only three variables. Here's my

code:pharma = str(doj[doj.id == '17-1204'].contents) spacy_pharma = nlp(pharma) for i in spacypharma.ents: print("Entity: " + i.text + "; NER tag: " + i.label)

qinip commented 1 year ago

Hi, my result of npl(pharma).ents doesn't contain "20 years" or "no greater than five years," what could I have done wrong? Thanks!

JiaqinWu commented 1 year ago

Hi, my result of npl(pharma).ents doesn't contain "20 years" or "no greater than five years," what could I have done wrong? Thanks!

me too...

rebeccajohnson88 commented 1 year ago

@JiaqinWu @qinip my guess is that this is a spaCy versioning difference, in which case:

can you run these commands and screenshot the result so we can compare versions? i can then update mine (if yours is more current) and post the output from that

!pip install session_info
import session_info
session_info.show()

image

JiaqinWu commented 1 year ago

image

It's mine. Thank you professor!

rebeccajohnson88 commented 1 year ago

got it! i'll upgrade or have a TA upgrade and post the printout from running with the newer version

for now, you're fine and i'd move onto the other problems

qinip commented 1 year ago
Screenshot 2022-10-25 at 4 40 08 PM
jswsean commented 1 year ago

Hi, my result of npl(pharma).ents doesn't contain "20 years" or "no greater than five years," what could I have done wrong? Thanks!

I also got ['last year', 'three years', 'three years'] as my output. My package versions are as follows:

image

rebeccajohnson88 commented 1 year ago

@jswsean yep i think that's output from a newer version - @sonali-sr can share when she produces output from that newer version. in meantime, fine to proceed and we'll give credit for both outputs since some OS may not be compatible with 3.9- thanks!

sonali-sr commented 1 year ago

Hello everyone, attached is the revised output for this question. Some of you all may have the output produced earlier - and as Prof. Johnson mentioned, both are fine.

Capture
bhollan commented 1 year ago

I'm on an intermediary version of spacy and am getting something unique from either of the "both" mentioned.

image

I'm getting this:

{'20 years', 'last year', 'three years'}
bhollan commented 1 year ago

Now I forced the install of 3.4.2.

image

But it's still giving me the same as before:

{'20 years', 'last year', 'three years'}
rebeccajohnson88 commented 1 year ago

@bhollan yea the versioning stuff on this is puzzling; TAs will be grading based on code rather than output as a result so yours is fine- thanks!