Open pachadotdev opened 4 days ago
@ropensci-review-bot check package
Thanks, about to send the query.
Error (500). The editorcheck service is currently unavailable
@ropensci-review-bot check package
Thanks, about to send the query.
Error (500). The editorcheck service is currently unavailable
@pachadotdev and @emilyriederer Sorry for any inconvenience caused by these errors. Our check system hasn't yet been properly configured to handle packages in sub-directories. I'll let you know here when we've updated, and you can call checks again.
Hey @pachadotdev ! This seems like some very cool software and an important goal. Could you please elaborate on how you see this package interacting with the redatamx
package you mention and the litalbarkai/open-redatam
package it is forked from? As a general principle, we're unable to consider forks for submission. Would it be possible to contribute this code to the original source repo or restructure it?
Hey @pachadotdev ! This seems like some very cool software and an important goal. Could you please elaborate on how you see this package interacting with the
redatamx
package you mention and thelitalbarkai/open-redatam
package it is forked from? As a general principle, we're unable to consider forks for submission. Would it be possible to contribute this code to the original source repo or restructure it?
redatamx is a new package made by ECLAC, it is focused in the new "Redatam X" format and I have no part on it
redatam (retired from CRAN, I hope to get it back there soon) is more focused on data "archeology," and I already have a group of users from Latin America that need demographic data for the period 1990-2020, that is the span of years where the formats DIC (Redatam versions 1 to 5) and DICX (Redatam 6 and ongoing) were in use.
@litalbarkai wrote the C++ parts, then I focused on the R and Python code and I made some refactors to make it work with C++ 11 and very minimal dependencies (i.e., pugixml instead of building/installing Apache Xerces), but it is a collaborative project and Lital is a co-author. I also wrote the article that we sent to the journal, where I was 100% focused on the "human writing" and not the "code writing", and Lital is the lead singer for the C++ parts.
We have two repos and send each other PRs to keep it neat. Could we use branches? yes, but I am a boomer.
The alternative to this package is to use old hardware and a point-and-click tool on Windows 98/XP, which is why I keep my old ThinkPad X200 and an external DVD reader. It not feasible to read old census data with modern hardware, which is a problem derived from it being in a closed source format. Even worse, some recent census data comes with an installer that does not work on Windows 10+, and that I was able to extract the data by using Wine on my main modern laptop.
Submitting Author Name: Mauricio Pacha Vargas Sepulveda Submitting Author Github Handle: !--author1-->@pachadotdev<!--end-author1-- Other Package Authors Github handles: (comma separated, delete if none) !--author-others-->@litalbarkai<!--end-author-others-- Repository: https://github.com/pachadotdev/open-redatam/tree/main/rpkg Submission type: Pre-submission Language: en
Scope
Please indicate which category or categories from our package fit policies or statistical package categories this package falls under. (Please check one or more appropriate boxes below):
Data Lifecycle Packages
[ ] data retrieval
[x] data extraction
[ ] data munging
[ ] data deposition
[ ] workflow automation
[ ] version control
[ ] citation management and bibliometrics
[ ] scientific software wrappers
[ ] field and lab reproducibility tools
[x] database software bindings
[ ] geospatial data
[ ] text analysis
Statistical Packages
[ ] Bayesian and Monte Carlo Routines
[ ] Dimensionality Reduction, Clustering, and Unsupervised Learning
[ ] Machine Learning
[ ] Regression and Supervised Learning
[ ] Exploratory Data Analysis (EDA) and Summary Statistics
[ ] Spatial Analyses
[ ] Time Series Analyses
[ ] Probability Distributions
Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:
REDATAM is a closed-source format for census and survey data. This package is an "archeological" version of the haven package, and allows to read this specific format widely used in Latin America by different govt. statistical offices. With this package, I have been able to convert census data from the 1990s that is not possible with the Redatam software on Windows because of multiple hardware changes in the last 30 years, and this software also reads recent census data (2017-2020) correctly.
If submitting a statistical package, have you already incorporated documentation of standards into your code via the srr package?
Who is the target audience and what are scientific applications of this package?
Sociologists, Political Scientists and Economists that need census data and an easy way to read it in R (or Python) to fit regression models or different kinds of analysis.
No. There is a "redatamx" that reads a newer format.
Yes.
This package was removed from CRAN for asking about a specific CLANG-ASAN error that took me long to replicate. The error was asked here as well https://stackoverflow.com/questions/79171799/addresssanitizer-error-alloc-dealloc-mismatch-operator-new-vs-free-in-r-packa