ropensci / software-review

rOpenSci Software Peer Review.
286 stars 104 forks source link

cat2cat: Handling an Inconsistently Coded Categorical Variable in a Panel Dataset #562

Open Polkas opened 1 year ago

Polkas commented 1 year ago

Submitting Author Name: Maciej Nasinski Submitting Author Github Handle: !--author1-->@polkas<!--end-author1-- Other Package Authors Github handles: (comma separated, delete if none) Repository: https://github.com/Polkas/cat2cat Submission type: Pre-submission Language: en


Package: cat2cat
Title: Handling an Inconsistently Coded Categorical Variable in a Panel Dataset
Version: 0.4.5.9000
Authors@R: person("Maciej", "Nasinski", email = "nasinski.maciej@gmail.com", role = c("aut", "cre"))
Maintainer: Maciej Nasinski <nasinski.maciej@gmail.com>
Description: 
  Unifying of an inconsistently coded categorical variable between two different time points in accordance with a mapping table.
  The main rule is to replicate the observation if it could be assign to a few categories.
  Then using simple frequencies or modern statistical methods to approximate probabilities of being assign to each of them.
  This novel procedure was invented and implemented in the paper by (Nasinski, Majchrowska and Broniatowska (2020) <doi:10.24425/cejeme.2020.134747>).
Depends: R (>= 3.6)
License: GPL (>= 2)
URL: https://github.com/Polkas/cat2cat, https://polkas.github.io/cat2cat/
BugReports: https://github.com/Polkas/cat2cat/issues
Encoding: UTF-8
Imports:
    MASS
Suggests:
    caret,
    randomForest,
    knitr,
    rmarkdown,
    pacman,
    testthat (>= 3.0.0),
    magrittr,
    dplyr
LazyData: true
VignetteBuilder: knitr
RoxygenNote: 7.2.1
Config/testthat/edition: 3

Scope

The main objective is to unify the inconsistently coded categorical variables in a panel/longitudinal dataset. The supervised methods can be used in the cat2cat procedure. The output from the cat2cat function can be used in the e.g. weighted linear regression or to assess the counts over the time.

I plan to apply it when know if I can submit the package.

Any scientific field where the panel/longitudinal dataset can be used. Examples of a panel dataset with such inconsistent coded categorical variables are ones linked with the The International Standard Classification of Occupations (ISCO) and the International Classification of Diseases (ICS).

According to best of my knowledge there is no alternative to my solution other than aggregate the datasets (with some simplifications) or remove the variable.

annakrystalli commented 1 year ago

Thanks for the pre-submission enquiry @Polkas !

The editorial team is discussing and we'll get back to you shortly.

annakrystalli commented 1 year ago

Dear @Polkas,

The editorial team has concluded that the package definitely fits in our "stats" scope.

Before proceeding and closing this pre-sub enquiry, there is also a need to clarify what category it would fit. The stats-devguide states categories are appropriate where at least half of all standards can be applied. We suggest you need to try and narrow down to one category only.

We feel it does not best fit the "time series" category and seems initially to most likely be "Machine Learning," We suggest you spend a little time to read though the standards and consider which you would think most appropriate.

Following that, the best way to confirm would be to go through the formal process of documenting compliance with the stats standards, which needs to be done prior to submission anyways. You can call @ropensci-review-bot check srr in this issue to confirm documentation has been completed successfully. You can find more details in our documentation.

Just ping me here to confirm that's done and the category you have narrowed it down too or if you need any help.

Thanks again for your enquiry!

maurolepore commented 1 year ago

Dear @Polkas,

Today starts my rotation as EiC meaning the role of @annakrystalli is now mine. Did you have the chance to follow up on the comment above?

maelle commented 1 year ago

:wave: @Polkas! I'm now the current editor in chief Any update? :smile_cat:

maelle commented 1 year ago

@Polkas friendly reminder, did you get a chance to work on the comments from https://github.com/ropensci/software-review/issues/562#issuecomment-1339049291?

Polkas commented 1 year ago

Hey, thank you for your update. I already assessed what category and scope is possible for my package. I found out that the base requirements are possible to be followed. I am limited with any decision to update my package now as I submited my paper to SoftwareX journal and waiting for their decision and comments.

maelle commented 11 months ago

@Polkas any update? :smile_cat:

Polkas commented 9 months ago

Hey, my paper was just published. I will start to work on the new feature branch for possible ropensci submission. I will give here a follow-up. Have a great day.

jhollist commented 6 months ago

@Polkas I am currently serving as the EIC and am checking in on some older submissions. First, congrats on the publication! You mentioned that you might pursue another submission to rOpenSci. Have you decided to move forward with that?

Polkas commented 6 months ago

Hey @jhollist, thank you for your response. I have dedicated effort to align with the expected standards. However, it appears that the current focus of rOpenSci may have shifted away from packages similar to mine.

I understand that rOpenSci is now prioritizing support for packages that facilitate reproducible research and manage the data lifecycle for scientists. I have thoroughly reviewed the current package categories and, unfortunately, it seems my package may not align with any of these categories.

If my understanding is correct and my package indeed falls outside the scope of rOpenSci's current focus, please feel free to close this issue.

jhollist commented 6 months ago

@Polkas your package is a better fit for our Statistical Software. Based on the conversation above (https://github.com/ropensci/software-review/issues/562#issuecomment-1339049291), take a close look at https://stats-devguide.ropensci.org/pkgdev.html#scope and see if you think any of those fit. The prior conversations on here and amongst the editors felt like Machine Learning might be the best fit. If you would like to proceed take a close look at the Stats devguide. If you have specific questions after that, you can ping me again here. Thanks!

ldecicco-USGS commented 4 months ago

Hi @Polkas ! I'm checking in on submissions that have been sitting for awhile. It sounds like the feedback has been that this package would be better suited for the rOpenSci Statistical Software submission. The process is similar, but there are a few differences. I'll once again plug the statistical submission guide:

https://stats-devguide.ropensci.org/

Let me know if you have any questions.