wordpredictor pre-submission inquiry

pakjiddat commented 3 years ago

I had submitted a pre-submission inquiry for my R Package on: https://github.com/ropensci/software-review/issues/448. But it was considered as out of scope. The reviewer suggested posting my submission request to the statistical peer-review pilot project. Does my project meet the requirements for "Statistical Software". Should I submit a review request?.

mpadge commented 3 years ago

Thanks @pakjiddat for asking here. I think your package may be suitable for submission as a statistical package, but you would have to be the ultimate judge of that. The processes for preparing packages for submission are described in our books on Statistical Software Peer Review, primarily in the Guide for Authors chapter. You'll note there the important statement that,

Any software which can be aligned with one or more sets of category-specific standards will by definition be considered in scope.

Your word-predictor package seems like it might fit in the Unsupervised Learning category, and/or possibly Exploratory Data Analysis. Your first task would be to examine those sets of category-specific standards to assess how many of them would be likely to apply to your package. If at least 50% or so of standards from any category would apply, then you may consider your package to fit within that category, and you would then be welcome to submit it. Please consider both of those categories, as this package may potentially fit within both, in which case it would be assessed under these two categories.

Feel free to ask any further questions here. If your package does indeed fit within one or both of these categories, and if you decide to begin the work of aligning your package with our statistical standards (using procedures described in the "Guide for Authors", then we would revisit your initial issue in the main software-review repository and continue from there.

@noamross Any additional thoughts you'd care to add here?

pakjiddat commented 3 years ago

@noamross, I think my package falls under the "Machine Learning Software" category. It seems to satisfies most of the standards given on the ml-demos.

The wordpredictor package basically allows generating n-gram language models. It follows the common ML workflow of pre-processing, model generation, model evaluation and model predictions. Please refer to the Overview and Features vignettes for details.

mpadge commented 3 years ago

@pakjiddat In that case, then as stated, you would be very welcome to start preparing your package for submission. Note that the ml-demos you link to are somewhat out of date, and the definitive reference for standards is always the actual standards themselves in Chapter 6 of the Statistical Software Peer Review Book.

This package would then be the first enquiry that we have had in the Machine Learning category, and i like that it already confounds a few of our expectations as expressed in the standards - particularly that there is no clear distinction in wordpredictor between test and training data, for reasons that indeed seem justifiable. You are welcome to discuss any mismatches you perceive along the way between your package and our standards, including by opening issues in the corresponding GitHub repo, or even pull requests suggesting potential modifications to standards. The standards are intended to evolve through the direct input of package authors themselves. As always, please ask any questions here in the meantime. Looking forward to hearing from you as you prepare the package for submission.

ropenscilabs / statistical-software-review

wordpredictor pre-submission inquiry #8