Closed lycheepie closed 2 years ago
The default mode of parallel is to split the data among the clusters, so when you change the number of clusters you will be running estimations with different amounts of data, so your coefficients change.
An alternative is to save your data to a file, call parallel
with the , nodata
option, and then have your user-program load the data explicitly.
Preliminaries
Before submitting an issue, please check (with
x
in brackets) that you:Expected behavior and actual behavior
I tried to parallelize a loop of panel data regression using Stata command xtreg. What I found is that depending on how many processes I initiate, regression results would vary. Below is the test result. The estimated coefficients are saved in variables “b_1” through “b_8” and the numbers 1 through 8 in variable names represent how many processes are used. The results without using parallel are saved in variable “b_benchmark”. As can be seen, “b_1” is the same as “b_benchmark” while all other results are different.
b_1 b_2 b_3 b_4 b_5 b_6 b_7 b_8 b_benchmark -0.0006171 -0.0012357 -0.0008273 -0.0009933 -0.0005501 0.0001222 0.0001298 -0.0002277 -0.0006171 -0.0006174 -0.0012359 -0.0008282 -0.0009947 -0.0005512 0.0001212 0.0001286 -0.0002292 -0.0006174 -0.0006168 -0.0012355 -0.0008272 -0.0009932 -0.0005505 0.0001214 0.0001297 -0.0002279 -0.0006168 -0.0006169 -0.0012354 -0.0008267 -0.0009928 -0.0005491 0.000123 0.0001316 -0.0002276 -0.0006169 -0.0006169 -0.0012358 -0.0008277 -0.0009931 -0.0005501 0.0001215 0.000129 -0.0002282 -0.0006169 -0.0006168 -0.0012356 -0.0008267 -0.0009929 -0.0005495 0.0001236 0.0001321 -0.0002255 -0.0006168 -0.0006169 -0.0012356 -0.0008276 -0.0009936 -0.0005505 0.0001217 0.000129 -0.000229 -0.0006169 -0.0006169 -0.001236 -0.0008274 -0.0009934 -0.0005511 0.000122 0.0001299 -0.0002273 -0.0006169 -0.0006169 -0.0012357 -0.0008276 -0.0009933 -0.0005502 0.0001211 0.0001288 -0.0002276 -0.0006169 -0.0006169 -0.0012356 -0.0008272 -0.0009934 -0.0005506 0.0001217 0.0001293 -0.0002272 -0.0006169
Steps to reproduce the problem
Below is the Stata code that produced above results.
System information
Some relevant information
Output from
creturn list
:System values
parallel_test_results.xlsx
Directories and paths
System limits
Numerical and string limits
Current dataset
Memory settings
Output settings
Interface settings
Graphics settings
Network settings
Update settings
Trace (program debugging) settings
Mata settings
Java settings
LAPACK settings
putdocx settings
Python settings
RNG settings
sort settings
Unicode settings
Other settings
Other