Closed vgherard closed 2 years ago
Thank you for our first statistical package pre-submission, @vgherard! I believe this clearly falls in scope and look forward to a full submission once you have incorporated the srr
standards component. I am querying the editorial board to ask for an opinion as to whether this package should also apply standards from the Supervised or Unsupervised learning categories.
Thanks @noamross, great :) I will begin looking to the srr
standards, then. It may take me some time, but I'm up for it. Earlier I did a quick check with autotest
and it seems like there's some trouble in parsing some of my examples, let's see if I can get it to work quickly.
Please ping me and @mpadge here with any questions, we know we are working out the kinks in the new system and are eager to help with the process to make it better!
Thanks @noamross (@mpadge), I've filed an issue at ropensci-review-tools/autotest#49
Hello @vgherard! We're going back to some in-progress submissions that got stuck in an ambiguous state. Sorry that we haven't reached out in a while. I just wanted to see if ropensci peer review is something you were still interested in pursuing.
Dear @noamross thanks for checking in and sorry for the long silence, I totally forgot about this process being open.
Sadly, right now I'm too short of time for a relatively demanding submission like this... Apart from this, over time I became a bit unsatisfied with certain aspects of this package, which I'd at least try to improve before submitting.
I'll close this, with the hope to come back to it in a not too far future :-)
Thanks!
@vgherard Any updates on the status of your package? We'd still be very interested in receiving a full submission :+1:
Dears, thanks for keeping in touch.
I had a look at the requirements I would need to cover in order to submit {kgrams}
, and again, sorry but this is too much for me.
The output of pkgcheck()
alone looks intimidating - function names, usage of <<-
, usage of sapply()
, etc.etc.. Also, I imagine that passing autotest
and srr
would probably be much more demanding.
These are in general quick things, but with a package of the dimension of {kgrams}
it takes a good amount of effort to finally get the green light - an effort I'm not really interested into, since the only thing I'm doing with that package at the moment is keeping it alive on CRAN :')
It's understood that when I say "too much" I refer only to my individual case - I think the work you're doing by putting up this review process is awesome.
For next package ideas I will definitely consider implementing ropensci standard from the onset!
Thanks @vgherard, I definitely understand. It's a shame, but you are probably right that it wouldn't be a trivial amount of work to prepare it. Thanks for considering, and for the kind words, and we look forward to future submissions at any time.
Submitting Author: Valerio Gherardi (@vgherard)
Repository: https://github.com/vgherard/kgrams Submission type: Pre-submission
Scope
Please indicate which category or categories from our package fit policies or statistical package categories this package falls under. (Please check an appropriate box below):
Data Lifecycle Packages
[ ] data retrieval
[ ] data extraction
[ ] database access
[ ] data munging
[ ] data deposition
[ ] workflow automation
[ ] version control
[ ] citation management and bibliometrics
[ ] scientific software wrappers
[ ] database software bindings
[ ] geospatial data
[ ] text data
Statistical Packages
[ ] Bayesian and Monte Carlo Routines
[ ] Dimensionality Reduction, Clustering, and Unsupervised Learning
[x] Machine Learning
[ ] Regression and Supervised Learning
[ ] Exploratory Data Analysis (EDA) and Summary Statistics
[ ] Spatial Analyses
[ ] Time Series Analyses
Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:
This package implements classical k-gram language model algorithms, including utilities for training, evaluation and text prediction. Language models are an angular stone of Natural Language Processing applications, and the conceptual simplicity of k-gram models makes them a good model baseline, also of pedagogical value.
k-gram models are a simple form of Machine-Learning applied to text data; as such, machine-learning is definitely the most appropriate category within the above ones. I would be inclined to define this as an "Unsupervised" learning problem, since the target function being learned (the language's probability distribution over sentences) is clearly not explicit in the training data - but have never seen this particular qualification in the literature.
Not yet (NB: this is a presubmission inquiry).
The package can be useful for students and/or researchers, for performing small-scale experiments with Natural Language Processing. In addition, it might be helpful in the building of more complex language models, for quick baseline modeling.
I am not aware of any R package with same purpose and functionalities of
kgrams
. The CRAN package ngram has some relative overlap in scope, in that it provides k-gram tokenization algorithms and random text generation, but offers no support for language model algorithms.Not applicable.