gvegayon / parallel

PARALLEL: Stata module for parallel computing
https://rawgit.com/gvegayon/parallel/master/ado/parallel.html
MIT License
118 stars 26 forks source link

Convergence club package... error... ? #102

Open mrubP opened 1 year ago

mrubP commented 1 year ago

hi, I am trying to use this package with the "convergence club package". In particular, in the psecta package. I'm trying to use the examples that come with the package and it doesn't work. I mean, it falls. But it works fine if I don't use the "Parallel" package. You can install the package and see the article here (https://journals.sagepub.com/doi/pdf/10.1177/1536867X1801700407). Please consider that I am new to Stata. Could you help me please. Thank you very much.

gvegayon commented 1 year ago

@mrubP, I would need to see an example with the code. After giving a quick glimpse at the paper, it seems that the issue has to do with the fact that regression-type commands cannot be parallelized. The principle to using the parallel package is that: (i) whatever you are parallelizing can deal with splitting the data, or (ii) you are using parallel for simulations, in which case parallelization is possible as simulations are usually independent.

mrubP commented 1 year ago

Thanks for the reply. I am copying using the basic code from the package example. I am only parallelizing the part that would be, as I understand it, an iterative process. The code without using parallelism is "psecta lnpgdp2, name(country) kq(0.333) gen(club) noprt". What this part of the code basically does is apply the test by groups, that is, adding each time a country (in the case of the example) to form clubs that pass the test, that is, they are convergent. If you want you can run it without the parallel package to see how it works and is fast. The problem is that if I want to work with a lot of data it is very slow. That's why I want to use your package. I hope you can understand and help me.

use ps2009 egen id = group(country) xtset id year generate lnpgdp=ln(pgdp) pfilter lnpgdp, method(hp) trend(lnpgdp2) smooth(400) logtreg lnpgdp2, kq(0.333) parallel initialize parallel version parallel : psecta lnpgdp2, name(country) kq(0.333) gen(club) `noprt

parallel

gvegayon commented 1 year ago

Does psecta generate a variable as a result of the function? If that's the case, what you need is to use the by option:

image

More info here. The other thing to keep in mind is that parallel : has an overhead cost associated with how much computation it takes to (a) sort the data, (b) save the splits, and (c) load the splits into each process.

HIH

mrubP commented 1 year ago

Dear Thanks again for your answer. To your question, the answer is yes, a new variable is generated, in particular "club" as a result of the classification. I have tried but no. I can't, I can't understand. I don't have enough knowledge in Stata. I don't understand how to implement the "by". I followed the example, but no, it doesn't run. Thank you very much for your time.