gvegayon / parallel

PARALLEL: Stata module for parallel computing
https://rawgit.com/gvegayon/parallel/master/ado/parallel.html
MIT License
118 stars 26 forks source link

Inconsistent results from xtreg #100

Closed lycheepie closed 2 years ago

lycheepie commented 2 years ago

Preliminaries

Before submitting an issue, please check (with x in brackets) that you:

Expected behavior and actual behavior

I tried to parallelize a loop of panel data regression using Stata command xtreg. What I found is that depending on how many processes I initiate, regression results would vary. Below is the test result. The estimated coefficients are saved in variables “b_1” through “b_8” and the numbers 1 through 8 in variable names represent how many processes are used. The results without using parallel are saved in variable “b_benchmark”. As can be seen, “b_1” is the same as “b_benchmark” while all other results are different.

b_1 b_2 b_3 b_4 b_5 b_6 b_7 b_8 b_benchmark -0.0006171 -0.0012357 -0.0008273 -0.0009933 -0.0005501 0.0001222 0.0001298 -0.0002277 -0.0006171 -0.0006174 -0.0012359 -0.0008282 -0.0009947 -0.0005512 0.0001212 0.0001286 -0.0002292 -0.0006174 -0.0006168 -0.0012355 -0.0008272 -0.0009932 -0.0005505 0.0001214 0.0001297 -0.0002279 -0.0006168 -0.0006169 -0.0012354 -0.0008267 -0.0009928 -0.0005491 0.000123 0.0001316 -0.0002276 -0.0006169 -0.0006169 -0.0012358 -0.0008277 -0.0009931 -0.0005501 0.0001215 0.000129 -0.0002282 -0.0006169 -0.0006168 -0.0012356 -0.0008267 -0.0009929 -0.0005495 0.0001236 0.0001321 -0.0002255 -0.0006168 -0.0006169 -0.0012356 -0.0008276 -0.0009936 -0.0005505 0.0001217 0.000129 -0.000229 -0.0006169 -0.0006169 -0.001236 -0.0008274 -0.0009934 -0.0005511 0.000122 0.0001299 -0.0002273 -0.0006169 -0.0006169 -0.0012357 -0.0008276 -0.0009933 -0.0005502 0.0001211 0.0001288 -0.0002276 -0.0006169 -0.0006169 -0.0012356 -0.0008272 -0.0009934 -0.0005506 0.0001217 0.0001293 -0.0002272 -0.0006169

Steps to reproduce the problem

Below is the Stata code that produced above results.

clear all

timer clear

set seed 123

local n = 5e6

set obs `n'

forval i = 1/10 {
    gen x`i' = rnormal()
}
gen y = rnormal()
gen z = rnormal()
gen id = ceil(_n/500)
bys id: gen time = _n
xtset id time

cap prog drop parfor
prog def parfor
        args var
        gen b_`var' = .
        forval i = 1/10 {
            xtreg y x`i' z, fe
            replace b_`var' = _b[z] if _n == `i'
        }
end

// Running the algorithm in parallel fashion
forval j = 1/8 {
    timer on `j'
    parallel initialize `j', f
    parallel, prog(parfor): parfor `j'
    timer off `j'
}

System information

Some relevant information

Output from creturn list:

System values

----------------------------------------------------------------------------
    c(current_date) = "13 Sep 2022"
    c(current_time) = "14:06:24"
       c(rmsg_time) = 0                          (seconds, from set rmsg)
----------------------------------------------------------------------------
   c(stata_version) = 17
         c(version) = 17                         (version)
     c(userversion) = 17                         (version)
  c(dyndoc_version) = 2                          (dyndoc)
----------------------------------------------------------------------------
       c(born_date) = "20 Apr 2021"

parallel_test_results.xlsx

         c(edition) = "BE"
    c(edition_real) = "MP"
             c(bit) = 64
              c(SE) = 1
              c(MP) = 1
      c(processors) = 8                          (Stata/MP, set processors)
  c(processors_lic) = 8
 c(processors_mach) = 32
  c(processors_max) = 8
            c(mode) = ""
         c(console) = ""
----------------------------------------------------------------------------
              c(os) = "Windows"
           c(osdtl) = "64-bit"
        c(hostname) = "ecnswn34p"
    c(machine_type) = "PC (64-bit x86-64)"
       c(byteorder) = "lohi"
        c(username) = "ltang"
----------------------------------------------------------------------------

Directories and paths

----------------------------------------------------------------------------
    c(sysdir_stata) = "D:\apps\Stata17_mp/"      (sysdir)
     c(sysdir_base) = "\\ecnswn34p\apps\.."      (sysdir)
     c(sysdir_site) = "\\ecnswn34p\apps\.."      (sysdir)
     c(sysdir_plus) = "\\ecnswn34p\apps\.."      (sysdir)
 c(sysdir_personal) = "\\ecnswn34p\apps\.."      (sysdir)
 c(sysdir_oldplace) = "\\ecnswn34p\apps\.."      (sysdir)
          c(tmpdir) = "d:\usr-profiles\l.."
----------------------------------------------------------------------------
         c(adopath) = "UPDATES;BASE;.;PE.."      (adopath)
             c(pwd) = "\\data4\users10\l.."      (cd)
          c(dirsep) = "/"
----------------------------------------------------------------------------

System limits

----------------------------------------------------------------------------
    c(max_N_theory) = 1099511627775
    c(max_k_theory) = 5000                       (set maxvar)
c(max_width_theory) = 1048576                    (set maxvar)
----------------------------------------------------------------------------
      c(max_matdim) = 65534
----------------------------------------------------------------------------
    c(max_it_cvars) = 64
    c(max_it_fvars) = 8
----------------------------------------------------------------------------
    c(max_macrolen) = 15480200
        c(macrolen) = 645200                     (set maxvar)
         c(charlen) = 67783
      c(max_cmdlen) = 15480216
          c(cmdlen) = 645216                     (set maxvar)
     c(namelenbyte) = 128
     c(namelenchar) = 32
           c(eqlen) = 1337
----------------------------------------------------------------------------

Numerical and string limits

----------------------------------------------------------------------------
       c(mindouble) = -8.9884656743e+307
       c(maxdouble) = 8.9884656743e+307
       c(epsdouble) = 2.22044604925e-16
  c(smallestdouble) = 2.2250738585e-308
----------------------------------------------------------------------------
        c(minfloat) = -1.70141173319e+38
        c(maxfloat) = 1.70141173319e+38
        c(epsfloat) = 1.19209289551e-07
----------------------------------------------------------------------------
         c(minlong) = -2147483647
         c(maxlong) = 2147483620
----------------------------------------------------------------------------
          c(minint) = -32767
          c(maxint) = 32740
----------------------------------------------------------------------------
         c(minbyte) = -127
         c(maxbyte) = 100
----------------------------------------------------------------------------
    c(maxstrvarlen) = 2045
   c(maxstrlvarlen) = 2000000000
    c(maxvlabellen) = 32000
----------------------------------------------------------------------------

Current dataset

----------------------------------------------------------------------------
           c(frame) = "default"
               c(N) = 5000000
               c(k) = 15
           c(width) = 60
         c(changed) = 1
        c(filename) = ""
        c(filedate) = "13 Sep 2022 14:06"
----------------------------------------------------------------------------

Memory settings

----------------------------------------------------------------------------
          c(memory) = 402653184
          c(maxvar) = 5000                       (set maxvar)
        c(niceness) = 5                          (set min_memory)
      c(min_memory) = 0                          (set min_memory)
      c(max_memory) = .                          (set max_memory)
     c(segmentsize) = 33554432                   (set segmentsize)
         c(adosize) = 1000                       (set adosize)
 c(max_preservemem) = 1073741824                 (set max_preservemem)
----------------------------------------------------------------------------

Output settings

----------------------------------------------------------------------------
            c(more) = "off"                      (set more)
            c(rmsg) = "off"                      (set rmsg)
              c(dp) = "period"                   (set dp)
        c(linesize) = 80                         (set linesize)
        c(pagesize) = 24                         (set pagesize)
         c(logtype) = "smcl"                     (set logtype)
         c(noisily) = 1
----------------------------------------------------------------------------
         c(iterlog) = "on"                       (set iterlog)
----------------------------------------------------------------------------
           c(level) = 95                         (set level)
          c(clevel) = 95                         (set clevel)
----------------------------------------------------------------------------
  c(showbaselevels) = ""                         (set showbaselevels)
  c(showemptycells) = ""                         (set showemptycells)
     c(showomitted) = ""                         (set showomitted)
         c(fvlabel) = "on"                       (set fvlabel)
          c(fvwrap) = 1                          (set fvwrap)
        c(fvwrapon) = "word"                     (set fvwrapon)
        c(lstretch) = ""                         (set lstretch)
----------------------------------------------------------------------------
         c(cformat) = ""                         (set cformat)
         c(sformat) = ""                         (set sformat)
         c(pformat) = ""                         (set pformat)
----------------------------------------------------------------------------
  c(coeftabresults) = "on"                       (set coeftabresults)
            c(dots) = "on"                       (set dots)
----------------------------------------------------------------------------
   c(collect_label) = "default"                  (set collect_label)
   c(collect_style) = "default"                  (set collect_style)
     c(table_style) = "table"                    (set table_style)
    c(collect_warn) = "on"                       (set collect_warn)

Interface settings

----------------------------------------------------------------------------
        c(dockable) = "on"                       (set dockable)
   c(locksplitters) = "off"                      (set locksplitters)
        c(pinnable) = "on"                       (set pinnable)
    c(doublebuffer) = "on"                       (set doublebuffer)
      c(reventries) = 5000                       (set reventries)
      c(fastscroll) = "on"                       (set fastscroll)
         c(linegap) = 1                          (set linegap)
   c(scrollbufsize) = 204800                     (set scrollbufsize)
           c(maxdb) = 50                         (set maxdb)
----------------------------------------------------------------------------

Graphics settings

----------------------------------------------------------------------------
        c(graphics) = "on"                       (set graphics)
   c(autotabgraphs) = "on"                       (set autotabgraphs)
          c(scheme) = "s2color"                  (set scheme)
      c(printcolor) = "asis"                     (set printcolor)
       c(copycolor) = "asis"                     (set copycolor)
   c(min_graphsize) = 1                          (region_options)
   c(max_graphsize) = 100                        (region_options)
----------------------------------------------------------------------------

Network settings

----------------------------------------------------------------------------
       c(httpproxy) = "on"                       (set httpproxy)
   c(httpproxyhost) = "wsproxy.imf.org"          (set httpproxyhost)
   c(httpproxyport) = 8080                       (set httpproxyport)
----------------------------------------------------------------------------
   c(httpproxyauth) = "off"                      (set httpproxyauth)
   c(httpproxyuser) = ""                         (set httpproxyuser)
     c(httpproxypw) = ""                         (set httpproxypw)
----------------------------------------------------------------------------

Update settings

----------------------------------------------------------------------------
    c(update_query) = "off"                      (set update_query)
 c(update_interval) = 7                          (set update_interval)
   c(update_prompt) = "off"                      (set update_prompt)
----------------------------------------------------------------------------

Trace (program debugging) settings

----------------------------------------------------------------------------
           c(trace) = "off"                      (set trace)
      c(tracedepth) = 32000                      (set tracedepth)
        c(tracesep) = "on"                       (set tracesep)
     c(traceindent) = "on"                       (set traceindent)
     c(traceexpand) = "on"                       (set traceexpand)
     c(tracenumber) = "off"                      (set tracenumber)
     c(tracehilite) = ""                         (set tracehilite)
----------------------------------------------------------------------------

Mata settings

----------------------------------------------------------------------------
      c(matastrict) = "off"                      (set matastrict)
        c(matalnum) = "off"                      (set matalnum)
    c(mataoptimize) = "on"                       (set mataoptimize)
       c(matafavor) = "space"                    (set matafavor)
       c(matacache) = 2000                       (set matacache)
        c(matalibs) = "lmatabase;lmataad.."      (set matalibs)
     c(matamofirst) = "off"                      (set matamofirst)
----------------------------------------------------------------------------

Java settings

----------------------------------------------------------------------------
    c(java_heapmax) = "4096m"                    (set java_heapmax)
       c(java_home) = "D:\apps\Stata17_m.."      (set java_home)
----------------------------------------------------------------------------

LAPACK settings

----------------------------------------------------------------------------
      c(lapack_mkl) = "on"                       (set lapack_mkl)
  c(lapack_mkl_cnr) = "default"                  (set lapack_mkl_cnr)
----------------------------------------------------------------------------

putdocx settings

----------------------------------------------------------------------------
  c(docx_hardbreak) = "off"                      (set docx_hardbreak)
   c(docx_paramode) = "off"                      (set docx_paramode)
----------------------------------------------------------------------------

Python settings

----------------------------------------------------------------------------
     c(python_exec) = ""                         (set python_exec)
 c(python_userpath) = ""                         (set python_userpath)
----------------------------------------------------------------------------

RNG settings

----------------------------------------------------------------------------
             c(rng) = "default"                  (set rng)
     c(rng_current) = "mt64"
        c(rngstate) = "XAAd24d75309de697.."      (set rngstate)
   c(rngseed_mt64s) = 123456789
       c(rngstream) = 1                          (set rngstream)
----------------------------------------------------------------------------

sort settings

----------------------------------------------------------------------------
      c(sortmethod) = "default"                  (set sortmethod)
    c(sort_current) = "fsort"
    c(sortrngstate) = "654269321XZA11221.."      (set sortrngstate)
----------------------------------------------------------------------------

Unicode settings

----------------------------------------------------------------------------
       c(locale_ui) = "en_US"                    (set locale_ui)
c(locale_functions) = "en_US"                    (set locale_functions)
  c(locale_icudflt) = "en_US"                    (unicode locale)
----------------------------------------------------------------------------

Other settings

----------------------------------------------------------------------------
            c(type) = "float"                    (set type)
         c(maxiter) = 300                        (set maxiter)
   c(searchdefault) = "all"                      (set searchdefault)
       c(varabbrev) = "on"                       (set varabbrev)
      c(emptycells) = "keep"                     (set emptycells)
         c(fvtrack) = "term"                     (set fvtrack)
          c(fvbase) = "on"                       (set fvbase)
      c(odbcdriver) = "unicode"                  (set odbcdriver)
        c(haverdir) = ""                         (set haverdir)
         c(fredkey) = ""                         (set fredkey)
  c(collect_double) = "on"                       (set collect_double)
----------------------------------------------------------------------------

Other

----------------------------------------------------------------------------
              c(pi) = 3.141592653589793
           c(alpha) = "a b c d e f g h i.."
           c(ALPHA) = "A B C D E F G H I.."
            c(Mons) = "Jan Feb Mar Apr M.."
          c(Months) = "January February .."
           c(Wdays) = "Sun Mon Tue Wed T.."
        c(Weekdays) = "Sunday Monday Tue.."
              c(rc) = 0                          (capture)
----------------------------------------------------------------------------
bquistorff commented 2 years ago

The default mode of parallel is to split the data among the clusters, so when you change the number of clusters you will be running estimations with different amounts of data, so your coefficients change.

An alternative is to save your data to a file, call parallel with the , nodata option, and then have your user-program load the data explicitly.