ropensci / software-review

rOpenSci Software Peer Review.
292 stars 104 forks source link

redatam #672

Open pachadotdev opened 4 days ago

pachadotdev commented 4 days ago

Submitting Author Name: Mauricio Pacha Vargas Sepulveda Submitting Author Github Handle: !--author1-->@pachadotdev<!--end-author1-- Other Package Authors Github handles: (comma separated, delete if none) !--author-others-->@litalbarkai<!--end-author-others-- Repository: https://github.com/pachadotdev/open-redatam/tree/main/rpkg Submission type: Pre-submission Language: en


Package: redatam
Type: Package
Title: Import 'REDATAM' Files 
Version: 2.0.3
Authors@R: c(
    person(
        given = "Mauricio",
        family = "Vargas Sepulveda",
        role = c("aut", "cre"),
        email = "m.sepulveda@mail.utoronto.ca",
        comment = c(ORCID = "0000-0003-1017-7574")),
    person(
        given = "Lital",
        family = "Barkai",
        role = "aut"),
    person(
        given = "Arseny",
        family = "Kapoulkine",
        role = "ctb",
        comment = "'pugixml' C++ library"),
    person(
        family = "Republic of Ecuador",
        role = "dtc",
        comment = "Galapagos census data")
    )
Imports:
    data.table,
    janitor,
    stringi
Suggests: 
    knitr,
    rmarkdown,
    testthat (>= 3.0.0)
Depends: R(>= 3.5.0)
Description: Import 'REDATAM' formats into R via the 'Open REDATAM' C++ library
    <https://github.com/litalbarkai/open-redatam> based on De Grande (2016)
    <https://www.jstor.org/stable/24890658>.
License: Apache License (>= 2)
URL: https://github.com/litalbarkai/open-redatam
BugReports: https://github.com/litalbarkai/open-redatam/issues
RoxygenNote: 7.3.2
Encoding: UTF-8
NeedsCompilation: yes
VignetteBuilder: knitr
LinkingTo: cpp11
Config/testthat/edition: 3

Scope

REDATAM is a closed-source format for census and survey data. This package is an "archeological" version of the haven package, and allows to read this specific format widely used in Latin America by different govt. statistical offices. With this package, I have been able to convert census data from the 1990s that is not possible with the Redatam software on Windows because of multiple hardware changes in the last 30 years, and this software also reads recent census data (2017-2020) correctly.

Sociologists, Political Scientists and Economists that need census data and an easy way to read it in R (or Python) to fit regression models or different kinds of analysis.

No. There is a "redatamx" that reads a newer format.

Yes.

This package was removed from CRAN for asking about a specific CLANG-ASAN error that took me long to replicate. The error was asked here as well https://stackoverflow.com/questions/79171799/addresssanitizer-error-alloc-dealloc-mismatch-operator-new-vs-free-in-r-packa

emilyriederer commented 2 days ago

@ropensci-review-bot check package

ropensci-review-bot commented 2 days ago

Thanks, about to send the query.

ropensci-review-bot commented 2 days ago

Error (500). The editorcheck service is currently unavailable

emilyriederer commented 2 days ago

@ropensci-review-bot check package

ropensci-review-bot commented 2 days ago

Thanks, about to send the query.

ropensci-review-bot commented 2 days ago

Error (500). The editorcheck service is currently unavailable

mpadge commented 1 day ago

@pachadotdev and @emilyriederer Sorry for any inconvenience caused by these errors. Our check system hasn't yet been properly configured to handle packages in sub-directories. I'll let you know here when we've updated, and you can call checks again.

emilyriederer commented 1 day ago

Hey @pachadotdev ! This seems like some very cool software and an important goal. Could you please elaborate on how you see this package interacting with the redatamx package you mention and the litalbarkai/open-redatam package it is forked from? As a general principle, we're unable to consider forks for submission. Would it be possible to contribute this code to the original source repo or restructure it?

pachadotdev commented 1 day ago

Hey @pachadotdev ! This seems like some very cool software and an important goal. Could you please elaborate on how you see this package interacting with the redatamx package you mention and the litalbarkai/open-redatam package it is forked from? As a general principle, we're unable to consider forks for submission. Would it be possible to contribute this code to the original source repo or restructure it?

redatamx is a new package made by ECLAC, it is focused in the new "Redatam X" format and I have no part on it

redatam (retired from CRAN, I hope to get it back there soon) is more focused on data "archeology," and I already have a group of users from Latin America that need demographic data for the period 1990-2020, that is the span of years where the formats DIC (Redatam versions 1 to 5) and DICX (Redatam 6 and ongoing) were in use.

@litalbarkai wrote the C++ parts, then I focused on the R and Python code and I made some refactors to make it work with C++ 11 and very minimal dependencies (i.e., pugixml instead of building/installing Apache Xerces), but it is a collaborative project and Lital is a co-author. I also wrote the article that we sent to the journal, where I was 100% focused on the "human writing" and not the "code writing", and Lital is the lead singer for the C++ parts.

We have two repos and send each other PRs to keep it neat. Could we use branches? yes, but I am a boomer.

The alternative to this package is to use old hardware and a point-and-click tool on Windows 98/XP, which is why I keep my old ThinkPad X200 and an external DVD reader. It not feasible to read old census data with modern hardware, which is a problem derived from it being in a closed source format. Even worse, some recent census data comes with an installer that does not work on Windows 10+, and that I was able to extract the data by using Wine on my main modern laptop.