ropensci / software-review

rOpenSci Software Peer Review.
291 stars 104 forks source link

Pre-submission inquiry for {kgrams}: Classical k-gram Language Models #450

Closed vgherard closed 3 years ago

vgherard commented 3 years ago

Submitting Author: Valerio Gherardi (@vgherard)
Repository: https://github.com/vgherard/kgrams Submission type: Pre-submission


Package: kgrams
Title: Classical k-gram Language Models
Version: 0.1.0.9000
Authors@R: 
    person(given = "Valerio",
           family = "Gherardi",
           role = c("aut", "cre"),
           email = "vgherard@sissa.it",
           comment = c(ORCID = "0000-0002-8215-3013"))
Description: 
        Tools for training and evaluating k-gram language models in R, 
        supporting several probability smoothing techniques, 
        perplexity computations, random text generation and more.
License: GPL (>= 3)
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1
SystemRequirements: C++11
LinkingTo: 
    Rcpp, RcppProgress
Imports: 
    Rcpp, rlang, methods, utils,  RcppProgress (>= 0.1), Rdpack
Depends: 
    R (>= 3.5)
Suggests: 
    testthat (>= 3.0.0),
    covr,
    knitr,
    rmarkdown
Config/testthat/edition: 3
RdMacros: Rdpack
VignetteBuilder: knitr
URL: https://vgherard.github.io/kgrams/,
    https://github.com/vgherard/kgrams
BugReports: https://github.com/vgherard/kgrams/issues

Scope

This package implements classical k-gram language model algorithms, including utilities for training, evaluation and text prediction. Language models are an angular stone of Natural Language Processing applications, and the conceptual simplicity of k-gram models makes them a good model baseline, also of pedagogical value.

The package can be useful for students and/or researchers, for performing small-scale experiments with Natural Language Processing. In addition, it might be helpful in the building of more complex language models, for quick baseline modeling.

I am not aware of any R package with same purpose and functionalities of kgrams. The CRAN package ngram has some relative overlap in scope, in that it provides k-gram tokenization algorithms, but offers no support for language model algorithms.

Not applicable

  1. The package was accepted some months ago by CRAN.
  2. Despite the "lifecycle:experimental" badge and the development version number, I am not currently planning any important API change or additional feature for this package (with the exception for feedback/suggestions which might originate from an rOpenSci review, of course).
vgherard commented 3 years ago

I opened a new presubmission inquiry in #452