seancarmody / ngramr

R package to query the Google Ngram Viewer
Other
48 stars 9 forks source link

Has the wildcard syntax changed? #30

Closed pnewall closed 3 years ago

pnewall commented 3 years ago

Hi,

In 1.5.x, the syntax below worked just fine for me

    ging <- paste0(ip, " *")
    ng <- ngramr::ngram(ging, year_start = 1950)

where ip is a string so that ging could be (say) "a bachelor's degree in *".

If I go to Google & the ngram viewer, this string will return results.

But from R, with 1.7.x of the package, I'm getting the warning

The characters +, -, *, / require parentheses to be interpreted as a composition.

Have since tried "(a bachelor's degree in) " "((a bachelor's degree in) )" & others, but without success.

seancarmody commented 3 years ago

It looks as though there's a problem with the apostrophe ("a bachelor degree in *" works ok). I will look into it.

seancarmody commented 3 years ago

Actually, I can't get the string to work directly in the Google ngram viewer either: https://books.google.com/ngrams/graph?content=a+bachelor%27s+degree+in+%2A Can you please post the url of your successful query?

seancarmody commented 3 years ago

Actually, I think I know what's happening. The ngram viewer tokenises "bachelor's" to "bachelor 's" (two tokens) which takes the ngram length over 5 but it must check the length before tokenising so the error is confusing. Try "bachelor 's degree in *" (without the "a").

pnewall commented 3 years ago

Thanks Sean - looks like you're right

Ngram data table

Phrases: bachelor 's degree in a, bachelor 's degree in accounting, bachelor 's degree in any, bachelor 's degree in business, bachelor 's degree in economics, bachelor 's degree in education, bachelor 's degree in engineering, bachelor 's degree in English, bachelor 's degree in psychology, bachelor 's degree in the

Case-sensitive: TRUE

Corpuses: eng_2019

Smoothing: 3

Years: 1950-2019

Year Phrase Frequency Corpus Parent 1 1950 bachelor 's degree in a 5.060552e-09 eng_2019 bachelor 's degree in 2 1951 bachelor 's degree in a 4.616514e-09 eng_2019 bachelor 's degree in 3 1952 bachelor 's degree in a 4.642724e-09 eng_2019 bachelor 's degree in 4 1953 bachelor 's degree in a 4.591766e-09 eng_2019 bachelor 's degree in 5 1954 bachelor 's degree in a 5.609661e-09 eng_2019 bachelor 's degree in 6 1955 bachelor 's degree in a 5.323854e-09 eng_2019 bachelor 's degree in

plus output from plot

Rplot01

seancarmody commented 3 years ago

Excellent!