Closed RobertoLiebscher closed 8 years ago
Were you able to solve this?
Dear George,
Thanks for asking and many thanks for uploading this nice package.
I still get an error message though with my 15gb sample but after downsizing it to 3gb it works fine. Here is the Statalist thread for this: http://www.statalist.org/forums/forum/general-stata-discussion/general/1352049-parallel-computing-with-stata-se-13-1-and-parallel-package-error-198-while-setting-memory
Best regards, Roberto
Roberto Liebscher Catholic University of Eichstaett-Ingolstadt Department of Business Administration Chair of Banking and Finance Auf der Schanz 49 D-85049 Ingolstadt Germany Phone: (+49)-841-937-21929 FAX: (+49)-841-937-22883 E-mail: roberto.liebscher@ku.de Internet: http://www.ku.de/wwf/lfb/
Am 8/10/2016 um 7:13 PM schrieb George G. Vega Yon:
Were you able to solve this?
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/gvegayon/parallel/issues/41#issuecomment-238936220, or mute the thread https://github.com/notifications/unsubscribe-auth/AT9CClK9KIfy_dnQdLm86I3jUVvV22lCks5qegajgaJpZM4Jf0s9.
It is odd. Maybe there's something happening with mata underneath when you use big data... although, in my experience that hasn't been a problem. Perhaps the fact that you are using windows? Again, on linux machines this hasn't been a problem as I've used parallel with datasets of around ~20gigs or more (if I recall correctly). Anyway, I'm glad you worked it out and thanks for the question!
George G. Vega Yon +1 (626) 381 8171 http://www.its.caltech.edu/~gvegayon/
On Wed, Aug 10, 2016 at 12:10 PM, RobertoLiebscher <notifications@github.com
wrote:
Dear George,
Thanks for asking and many thanks for uploading this nice package.
I still get an error message though with my 15gb sample but after downsizing it to 3gb it works fine. Here is the Statalist thread for this: http://www.statalist.org/forums/forum/general-stata- discussion/general/1352049-parallel-computing-with-stata- se-13-1-and-parallel-package-error-198-while-setting-memory
Best regards, Roberto
Roberto Liebscher Catholic University of Eichstaett-Ingolstadt Department of Business Administration Chair of Banking and Finance Auf der Schanz 49 D-85049 Ingolstadt Germany Phone: (+49)-841-937-21929 FAX: (+49)-841-937-22883 E-mail: roberto.liebscher@ku.de Internet: http://www.ku.de/wwf/lfb/
Am 8/10/2016 um 7:13 PM schrieb George G. Vega Yon:
Were you able to solve this?
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/gvegayon/parallel/issues/41#issuecomment-238936220,
or mute the thread https://github.com/notifications/unsubscribe-auth/AT9CClK9KIfy_ dnQdLm86I3jUVvV22lCks5qegajgaJpZM4Jf0s9.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/gvegayon/parallel/issues/41#issuecomment-238971944, or mute the thread https://github.com/notifications/unsubscribe-auth/AA2is4IktKkPNy_NY4YQJcER5VF1Kx9hks5qeiIogaJpZM4Jf0s9 .
Dear George,
It seems to me that these problems arise if the dataset exceeds a certain size. For example the following code expands the dataset to a size of roughly 12gb. This results in the discussed memory error although my computer has 32gb memory. If I write expand 1000000 instead
Best regards, Roberto
expand 2000000
encode manager, gen(managerdum)
//Find principal amount with same lead
capture program drop myloop
program define myloop
forvalues i = 1/12{
gen leadamti' = . levelsof lead
i', local(leads)
foreach l of local leads {
bysort managerdum obsq: egen hlpvar = total(principalUSD) if (lead1 == "l'" | lead2 == "
l'" | lead3 == "l'" | lead4 == "
l'" | lead5 == "l'" | /// lead6 == "
l'" | lead7 == "l'" | lead8 == "
l'" | lead9 == "l'" | lead10 == "
l'" | lead11 == "l'" | lead12 == "
l'") & obsdate == mindate
bysort managerdum obsq: egen totamt = min(hlpvar)
replace leadamti' = totamt if lead
i' == "l'" & leadamt
i' == .
drop totamt hlpvar
}
}
end
capture parallel clean cd "C:\Users\wwa594\Documents\" parallel setclusters 8 sort managerdum parallel, by(managerdum) programs(myloop): myloop parallel clean
Am 8/10/2016 um 9:21 PM schrieb George G. Vega Yon:
It is odd. Maybe there's something happening with mata underneath when you use big data... although, in my experience that hasn't been a problem. Perhaps the fact that you are using windows? Again, on linux machines this hasn't been a problem as I've used parallel with datasets of around ~20gigs or more (if I recall correctly). Anyway, I'm glad you worked it out and thanks for the question!
George G. Vega Yon +1 (626) 381 8171 http://www.its.caltech.edu/~gvegayon/
On Wed, Aug 10, 2016 at 12:10 PM, RobertoLiebscher <notifications@github.com
wrote:
Dear George,
Thanks for asking and many thanks for uploading this nice package.
I still get an error message though with my 15gb sample but after downsizing it to 3gb it works fine. Here is the Statalist thread for this: http://www.statalist.org/forums/forum/general-stata- discussion/general/1352049-parallel-computing-with-stata- se-13-1-and-parallel-package-error-198-while-setting-memory
Best regards, Roberto
Roberto Liebscher Catholic University of Eichstaett-Ingolstadt Department of Business Administration Chair of Banking and Finance Auf der Schanz 49 D-85049 Ingolstadt Germany Phone: (+49)-841-937-21929 FAX: (+49)-841-937-22883 E-mail: roberto.liebscher@ku.de Internet: http://www.ku.de/wwf/lfb/
Am 8/10/2016 um 7:13 PM schrieb George G. Vega Yon:
Were you able to solve this?
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub
https://github.com/gvegayon/parallel/issues/41#issuecomment-238936220,
or mute the thread https://github.com/notifications/unsubscribe-auth/AT9CClK9KIfy_ dnQdLm86I3jUVvV22lCks5qegajgaJpZM4Jf0s9.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/gvegayon/parallel/issues/41#issuecomment-238971944, or mute the thread
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/gvegayon/parallel/issues/41#issuecomment-238975066, or mute the thread https://github.com/notifications/unsubscribe-auth/AT9CCpR2prUeGjZfU-2xPKFsgmS_JgUzks5qeiSpgaJpZM4Jf0s9.
This output catches my attention:
Parallel Computing with Stata (by GVY)
Clusters : 4
pll_id : 1mdx8agr10
Running at : C:\Users\wwa594\Documents
Randtype : datetime
Waiting for the clusters to finish...
0
cluster 0001 has exited without error...
0
cluster 0004 has exited without error...
-3621
cluster 0003 has exited without error...
-3621
cluster 0002 has exited without error...
Ping to @bquistorff : May this have something to do with the new implementation of the parallel_run
command? Since it is windows, it might be the case that it closes the sessions before saving the file (see https://github.com/gvegayon/parallel/blob/c6bb22fbcad84f5901dc6d1b904da86f3a017af5/ado/parallel_write_do.mata#L275-L277) losing the data.
I'll check into this. Might be a work or
Using StataSE-64 v13 I was not able to produce the bug from your posted setups (with the auto dataset or the dataex-type setup). (I had less RAM so I could only expand to a tenth the size). We have fixed some bugs so you can try the latest code.
With your real data it seems like the bug happens reliably when the data is too large. To me the likeliest case is that it is an issue with lack of RAM and that an alternative solution is needed. With parallel
, the parent instance of Stata loads the whole dataset in memory and the child processes together hold the same amount. The child processes also process the whole dataset so they need additional memory (though I'm not sure how much). So even though you have more RAM than the size of your dataset, you can quickly run out of RAM in practice. I think you should consider how to process your data in chunks serially when there is an issue doing it in parallel.
I'll mark this as closed for now, but feel free to re-open if we can find a way to reproduce the bug.
Hi there,
I am working with Stata SE 13.1 on a large dataset (15 gigabyte) on a 64 bit machine with 32gb RAM. When I tried to run parallel over the four cores of my computer I received an error message
The code I tried looks like this:
I discussed this issue on the Statalist before but the problem remains unsolved: http://www.statalist.org/forums/forum/general-stata-discussion/general/1352049-parallel-computing-with-stata-se-13-1-and-parallel-package-error-198-while-setting-memory
I am unable to replicate the error message with a publicly available dataset. But expanding the auto dataset and trying a simple task on indicates that the size of the dataset might cause the problem:
rep78 now has only four categories instead of five in the original dataset and the resulting dataset is 60 million observations short: 2 mio * 74 - 88 mio = 60 mio.
Do you have an idea what might have gone wrong here?