worldbank / repkit

A Stata package with tools related to computational reproducibility
https://worldbank.github.io/repkit/
6 stars 0 forks source link

`reprun`: Improve memory management for Sort RNG check #39

Closed bbdaniels closed 6 months ago

bbdaniels commented 7 months ago

Need to reduce the number of saves of data as much as possible since this will quickly overfill ordinary memory sizes if done after every line.

See the prior attempted implementation in 4f8f78c4ac746ed4ac158009094352991805f16c

bbdaniels commented 7 months ago

Solution to implement: pick up checksum from exported CSV file, this does not work with dta

. save "/users/bbdaniels/desktop/test1.dta"
file /users/bbdaniels/desktop/test1.dta saved

. save "/users/bbdaniels/desktop/test2.dta"
file /users/bbdaniels/desktop/test2.dta saved

. checksum "/users/bbdaniels/desktop/test1.dta"
Checksum for /users/bbdaniels/desktop/test1.dta = 245106026, size = 21446361

. checksum "/users/bbdaniels/desktop/test2.dta"
Checksum for /users/bbdaniels/desktop/test2.dta = 1861168015, size = 21446361

unless the name is doing that, let me see

. checksum "/users/bbdaniels/desktop/test2.dta"
Checksum for /users/bbdaniels/desktop/test2.dta = 1861168015, size = 21446361

. save "/users/bbdaniels/desktop/test2.dta" , replace
file /users/bbdaniels/desktop/test2.dta saved

. checksum "/users/bbdaniels/desktop/test2.dta"
Checksum for /users/bbdaniels/desktop/test2.dta = 805665321, size = 21446361

nope, fails even with same name; however, it works with CSV

.  export delimited using "/users/bbdaniels/desktop/test1.csv"
file /users/bbdaniels/desktop/test1.csv saved

.  export delimited using "/users/bbdaniels/desktop/test2.csv"
file /users/bbdaniels/desktop/test2.csv saved

. checksum "/users/bbdaniels/desktop/test1.csv"
Checksum for /users/bbdaniels/desktop/test1.csv = 3687382390, size = 33228932

. checksum "/users/bbdaniels/desktop/test2.csv"
Checksum for /users/bbdaniels/desktop/test2.csv = 3687382390, size = 33228932
. checksum "/users/bbdaniels/desktop/test2.csv"
Checksum for /users/bbdaniels/desktop/test2.csv = 3687382390, size = 33228932

.  export delimited using "/users/bbdaniels/desktop/test2.csv" , replace
file /users/bbdaniels/desktop/test2.csv saved

. checksum "/users/bbdaniels/desktop/test2.csv"
Checksum for /users/bbdaniels/desktop/test2.csv = 3687382390, size = 33228932
13:06

. sort country

.  export delimited using "/users/bbdaniels/desktop/test2.csv" , replace
file /users/bbdaniels/desktop/test2.csv saved

. checksum "/users/bbdaniels/desktop/test2.csv"
Checksum for /users/bbdaniels/desktop/test2.csv = 3687382390, size = 33228932

. sort year

.  export delimited using "/users/bbdaniels/desktop/test2.csv" , replace
file /users/bbdaniels/desktop/test2.csv saved

. checksum "/users/bbdaniels/desktop/test2.csv"
Checksum for /users/bbdaniels/desktop/test2.csv = 2265248663, size = 33228932