Closed pakjiddat closed 3 years ago
Thank you for your submission @pakjiddat . This is an interesting package, but I'm afraid it falls outside of our scope. The text-analysis category was always a pilot - one we are likely winding down - but in any case we did not include text-generation within it, as the purpose was largely to help create standard data types, classes, and processing workflows for the text analysis ecosystem.
It's possible that your package will be fall under our new statistical peer-review pilot: https://stats-devguide.ropensci.org/ . If you are interested in this, please drop a note in the issues here: https://github.com/ropenscilabs/statistical-software-review/issues
Submitting Author: Nadir Latif (@pakjiddat)
Repository: https://github.com/pakjiddat/word-predictor Submission type: Pre-submission
Paste the full DESCRIPTION file inside a code block below:
Package: wordpredictor Title: Develop Text Prediction Models Based on N-Grams Version: 0.0.2 URL: https://github.com/pakjiddat/word-predictor, https://pakjiddat.github.io/word-predictor/ BugReports: https://github.com/pakjiddat/word-predictor/issues Authors@R: person(given = "Nadir", family = "Latif", role = c("aut", "cre"), email = "pakjiddat@gmail.com", comment = c(ORCID = "0000-0002-7543-7405")) Description: A framework for developing n-gram models for text prediction. It provides data cleaning, data sampling, extracting tokens from text, model generation, model evaluation and word prediction. For information on how n-gram models work we referred to: "Speech and Language Processing" https://web.stanford.edu/~jurafsky/slp3/3.pdf. For optimizing R code and using R6 classes we referred to "Advanced R" https://adv-r.hadley.nz/r6.html. For writing R extensions we referred to "R Packages", https://r-pkgs.org/index.html. License: MIT + file LICENSE Encoding: UTF-8 Roxygen: list(markdown = TRUE) RoxygenNote: 7.1.1 Imports: digest, ggplot2, patchwork, stringr, dplyr, SnowballC Suggests: testthat, covr, knitr, rmarkdown, markdown VignetteBuilder: knitr Language: en-US
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below.:
Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:
The package generates n-gram models from plain text files. It allows text file analysis, data cleaning, generation of tokens, generation and evaluation of n-gram models and word prediction. "text analysis" seems to be the most suitable category for the package.
Who is the target audience and what are scientific applications of this package?
The target audience are users who need to analyse text using n-gram models. The package may be used in applications that require word prediction, spell checking, auto completion, search etc.
Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?
Well the packages quanteda and tm also allow text analysis. These packages are quite advanced and are widely used for word frequency analysis.
The wordpredictor package differs from tm and quanteda in that it allows generating self contained n-gram models. It also allows evaluating the model performance using Extrinsic and Intrinsic model evaluation.
(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?
I think the package complies with the guidance around Ethics, Data Privacy and Human Subjects Research.
Any other questions or issues we should be aware of?:
I had developed the wordpredictor package as part of the Data Science Capstone course. The package was developed in order to fulfill the project requirements.
The main project requirement was to develop an application for predicting words. The application should function like the Microsoft Swift key application. See this online presentation for details of the project.
The main functionality provided by the wordpredictor package is word prediction. Here is an online demo showing a possible use for the package.
I would like to improve the wordpredictor package so others find it useful.