Paste the full DESCRIPTION file inside a code block below:
Package: kgrams
Title: Classical k-gram Language Models
Version: 0.1.0.9000
Authors@R:
person(given = "Valerio",
family = "Gherardi",
role = c("aut", "cre"),
email = "vgherard@sissa.it",
comment = c(ORCID = "0000-0002-8215-3013"))
Description:
Tools for training and evaluating k-gram language models in R,
supporting several probability smoothing techniques,
perplexity computations, random text generation and more.
License: GPL (>= 3)
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1
SystemRequirements: C++11
LinkingTo:
Rcpp, RcppProgress
Imports:
Rcpp, rlang, methods, utils, RcppProgress (>= 0.1), Rdpack
Depends:
R (>= 3.5)
Suggests:
testthat (>= 3.0.0),
covr,
knitr,
rmarkdown
Config/testthat/edition: 3
RdMacros: Rdpack
VignetteBuilder: knitr
URL: https://vgherard.github.io/kgrams/,
https://github.com/vgherard/kgrams
BugReports: https://github.com/vgherard/kgrams/issues
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below.:
[ ] data retrieval
[ ] data extraction
[ ] database access
[ ] data munging
[ ] data deposition
[ ] workflow automation
[ ] version control
[ ] citation management and bibliometrics
[ ] scientific software wrappers
[ ] database software bindings
[ ] geospatial data
[x] text analysis
Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:
This package implements classical k-gram language model algorithms, including utilities for training, evaluation and text prediction. Language models are an angular stone of Natural Language Processing applications, and the conceptual simplicity of k-gram models makes them a good model baseline, also of pedagogical value.
Who is the target audience and what are scientific applications of this package?
The package can be useful for students and/or researchers, for performing small-scale experiments with Natural Language Processing. In addition, it might be helpful in the building of more complex language models, for quick baseline modeling.
I am not aware of any R package with same purpose and functionalities of kgrams. The CRAN package ngram has some relative overlap in scope, in that it provides k-gram tokenization algorithms, but offers no support for language model algorithms.
Any other questions or issues we should be aware of?:
The package was accepted some months ago by CRAN.
Despite the "lifecycle:experimental" badge and the development version number, I am not currently planning any important API change or additional feature for this package (with the exception for feedback/suggestions which might originate from an rOpenSci review, of course).
Submitting Author: Valerio Gherardi (@vgherard)
Repository: https://github.com/vgherard/kgrams Submission type: Pre-submission
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below.:
Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:
This package implements classical k-gram language model algorithms, including utilities for training, evaluation and text prediction. Language models are an angular stone of Natural Language Processing applications, and the conceptual simplicity of k-gram models makes them a good model baseline, also of pedagogical value.
The package can be useful for students and/or researchers, for performing small-scale experiments with Natural Language Processing. In addition, it might be helpful in the building of more complex language models, for quick baseline modeling.
I am not aware of any R package with same purpose and functionalities of
kgrams
. The CRAN package ngram has some relative overlap in scope, in that it provides k-gram tokenization algorithms, but offers no support for language model algorithms.Not applicable