Open njtierney opened 6 years ago
@njtierney So you're thinking of something where you'd provide a word and the package would report the syn or ant based on some pre-specified dictionary?
So ant("good")
would return [1] bad [2] wicked
?
Yup! Exactly that! I think that the trick is finding a good quality open source thesaurus that can be downloaded or provided with the package. This would mean that we avoid internet API calls so it would be fast, and not require an API key or internet.
But yes, I imagine it would be something like this:
syn("good")
[1] great fantastic excellent happy
Wouldn’t it be preferable to return a vector? That might make it easier on possible secondary arguments (e.g. return a variable number of values, and/or select return values based on default order/randomly). Just a thought.
Really cool idea. In a long run that could be a useful thing for editing longer prose inside markdown perhaps?
Are you aware of any publicly available data that could be used for that? Or API?
Nice idea. Would something like the Wiktionary Thesaurus be suitable as a data source?
A while back I had some mixed success downloading quotes for word lists from the Quotations Wiktionary. I imagine this might be similar to accessing the thesaurus information.
FWIW, here's the old code I used for downloading quotes in case it's useful
####
# Wiktionary quotes
####
# Description: Obtain phrases from wiktionary for given words.
# References:
# https://en.wiktionary.org/wiki/Wiktionary:Quotations
# https://en.wiktionary.org/wiki/Wiktionary:Entry_layout#Example_sentences
library(httr)
library(stringr)
wiki_quote <- function(some_word) {
some_url <- GET(paste0("https://en.wiktionary.org/w/index.php?title=", some_word, "&action=raw"))
some_text <- content(some_url, "text") # e.g. text content of a wiktionary page
some_pattern <- paste0("#:[^\n]+?'''", some_word, "'''.+?\n") # e.g. a wiktionary quote "#: There was a dark storm brewing.\n"
raw_match <- regexpr(some_pattern, some_text)
if (nchar(some_text) == 0) return(NA)
if (all(raw_match[[1]] == -1)) return(NA)
matched_substrings <- regmatches(some_text, raw_match)
lapply(matched_substrings, tidy_quote)
}
tidy_quote <- function(quote) {
temp <- str_replace(quote, "\\{\\{ux\\|en\\|", "")
temp <- str_replace(temp, "\\}\\}", "")
temp <- str_replace(temp, "#:", "")
trimws(temp)
}
wiki_quote("storm")
#> [[1]]
#> [1] "''The proposed reforms have led to a political '''storm'''.''"
wiki_quote("sunshine")
#> [[1]]
#> [1] "We were warmed by the bright '''sunshine'''."
wiki_quote("hufflepuff") # nonsense word - should retrun NA
#> [1] NA
Created on 2018-11-09 by the reprex package (v0.2.0).
What is says on the tin!
I've had this idea for a little while, mainly to stop me from going to google to look for synonyms - I haven't made any progress, but a stub of a package is here: https://github.com/njtierney/syn
The goal of syn is to provide two main functions:
syn
- generate synonymsant
- generate antonymsThere are other packages that do this, but they usually do this in the context of other text-related work.
In terms of applications, I would use this all the time to output a set of (syn/ant)onyms for words in the terminal, but I imagine it could also be useful for type of text analysis where you might want to search for similar words? I have 0 experience with text analysis, so perhaps there are better tools for that already.