sirupsen / dotfiles

Personal UNIX toolbox
192 stars 19 forks source link

Popularity of a new English word by Google n-gram not Google search results #3

Closed NicolaiSchmid closed 10 months ago

NicolaiSchmid commented 10 months ago

I just finished reading your coverage on Every.

In there, it was mentioned that you take the number of Google results for your English dictionary Airtable:

So I had to add another piece to the system to sort the words by how often they’re used. The most accurate proxy I’ve found is the number of Google results for a word.

I'm not sure whether you're aware, but to check how much a given word is used, you can use the Google N-gram viewer, that will show you the trend of the used word in literature between 1800 and 2019.

Here are a few examples from the screenshot of your airtable https://books.google.com/ngrams/graph?content=tacitly%2Cdesultory%2Cgumption%2Cironclad&year_start=1800&year_end=2019&case_insensitive=on&corpus=en-2019&smoothing=3

I hope this might be a helpful addition to your workflow!

sirupsen commented 10 months ago

hah, that's cool! I didn't know about this. I have something now that uses the Bing API to get # of search results, and it's working great. If I were to do it again, I'd probably just use that though!

I also suspect LLMs would be pretty good at this FWIW…

Simon https://sirupsen.com/

On Mon, Nov 06, 2023 at 3:44 AM, Nicolai Schmid @.***> wrote:

I just finished reading your coverage on Every https://every.to/superorganizers/how-to-make-yourself-into-a-learning-machine/ .

In there, it was mentioned that you take the number of Google results for your English dictionary Airtable:

So I had to add another piece to the system to sort the words by how often they’re used. The most accurate proxy I’ve found is the number of Google results for a word.

I'm not sure whether you're aware, but to check how much a given word is used, you can use the Google N-gram viewer, that will show you the trend of the used word in literature between 1800 and 2019.

Here are a few examples from the screenshot of your airtable https://books.google.com/ngrams/ graph?content=tacitly%2Cdesultory%2Cgumption%2Cironclad&year_start=1800&year_end=2019&case_insensitive=on&corpus=en-2019&smoothing=3

I hope this might be a helpful addition to your workflow!

— Reply to this email directly, view it on GitHub https://github.com/sirupsen/dotfiles/issues/3, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAXY6ATKGV4LNPDSK6W5LDYDCPPNAVCNFSM6AAAAAA67EA5Y6VHI2DSMVQWIX3LMV43ASLTON2WKOZRHE3TQNJZGYZDMNA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

NicolaiSchmid commented 10 months ago

I also suspect LLMs would be pretty good at this FWIW… but difficult to quantify?

I'm not sure whether the n-gram viewer has an API, but it's better to keep it automated!