r4fun / hierplane

🌳 Hierplane for R
https://r4fun.github.io/hierplane/
Other
9 stars 0 forks source link

Possible unexpected behavior #4

Closed tylerlittlefield closed 4 years ago

tylerlittlefield commented 4 years ago

For some reason, when the example "Sam likes boats" is lowercase, the root "likes" has attributes "PERSON" and "VERB". I think we would expect it to only have "VERB":

library(hierplane)

hierplane("Sam likes boats") # what we should expect (maybe?)
hierplane("sam likes boats") # likes -> person + verb??

Also:

hierplane:::build_tree("sam likes boats")

I checked out 28c79c145b0f96ddfe3e2e5d5ae2a7389cad34d2 with git checkout 28c79c145b0f96ddfe3e2e5d5ae2a7389cad34d2 to double check if I broke things but the behavior seems to be the same.

mathidachuk commented 4 years ago

I think the tags are as expected. It is because the spacy model does not recognize "sam likes boat" as a sentence because everything is lowercased.

image As seen in spacy_df() part of speech tagging, ALL three words are tagged as proper nouns. So I guess spacy thinks "sam likes boat" is a name as a whole??

On the other hand, as long as "Sam" is properly capitalized, spacy parses the whole sentence correctly. image

Either way, build_tree parses the data correctly based on spacy_df() output and spacy_attributes() values. Unless we want to prompt user as a reminder to reconsider casing in their input, I am not sure if there is a good way around this.

tylerlittlefield commented 4 years ago

Got it, thanks for checking. I’ll close this issue!