Zeromq example - Githubissues

gnudo commented 2 years ago

Hi Konstantin and Roman, could you give an example on how ZMQ is supposed to be used, in particular on the server side? As I understand, an outgoin_dict is composed (and send via socket), but it is unclear to me how this should then be interpreted by the server? Thx and best regards, Goran

yxrmz commented 2 years ago

Hi Goran, We use ZMQ as an OpenCL gate, for instance, you can run simple ray-tracing task on your laptop, but heavy calculations will be done on the server. Our goal was to create a single entry point for multiple clients and provide basic load balancing in case of multiple servers. So far it's just a proof-of-concept, and only works for synchrotron sources calculation, but if we see interest among the community, we can add support for other FLOP-hungry functions, like wave propagation. So, if you want to try it, first, your server should support OpenCL - devices/drivers/pyopencl. You run two scripts from xrt/tests/raycing/RemoteOpenCLCalculation in the background, note the frontend port number in the code. Then, on your client computer, in your synchrotron source init you should provide the "server:port" in targetOpenCL. The dict with input data will be generated and sent to the server as if it was a local OpenCL device. Does this answer your question or do you want more low-level implementation details?

gnudo commented 2 years ago

Hi Roman, thanks a lot for your quick and detailed answer. That (example) was exactly what I was looking for. And my motivation was just as you describe, i.e. to outsource heavy calculations (in my case, an undulator source) to a dedicated GPU machine which otherwise would take too long on the local one. So, from my point of view, some sort of MSG-ing definitely makes sense, also for eventual future wave propagation.

Now, following your suggestion I tried to test this on my local machine first:

I run both zmq_server.py and queue_device.py (all from the latest master branch) in two different terminals
I then set targetOpenCL='localhost:15559' in examples/withRaycing/01_SynchrotronSources/undulatorTapering_zmq.py
After setting this, I run undulatorTapering_zmq.py, which gives me then the following error (note that the error "Traceback..." only occurs after running undulatorTapering_zmq.py):

[lovric_g@pc12221 RemoteOpenCLCalculation]$ python zmq_server.py 
Beam horz. size dx = 0.09486832980505137 mm
Beam vert. size dz = 0.00447213595499958 mm
Beam horz. diverg. dxprime = 1.0540925533894598e-05 rad
Beam vert. diverg. dzprime = 2.2360679774997895e-06 rad
OpenCL: bulding undulator.cl ...
OpenCL: found none CPU
OpenCL: found 1 GPU
OpenCL: found none other accelerator
Traceback (most recent call last):
  File "<...>/xrt/tests/raycing/RemoteOpenCLCalculation/zmq_server.py", line 48, in <module>
    reply = CustomSource.ucl.run_parallel(**message)
  File "<...>/xrt/backends/raycing/myopencl.py", line 299, in run_parallel
    for ictx, ctx in enumerate(self.cl_ctx):
AttributeError: 'XRT_CL' object has no attribute 'cl_ctx'

Can you reproduce this on your side or am I doing something wrong here?

kklmn commented 2 years ago

Please pay attention to: "OpenCL: found none CPU" and do not use targetOpenCL='CPU'

gnudo commented 2 years ago

Hi Konstantin, thanks for the hint - removing the line indeed solved the problem and the test was successful. I'll still need to check why I get "found none CPU" (obviously my PC does have a CPU :-), but I suppose I will do this after the Christmas break.

One last question then: are you planning on including zmq support in the GUI (xrtQook) at some point as well or you generally prefer to receive PRs for such small CRs?

kklmn commented 2 years ago

If your CPU is from Intel, you need "Intel CPU only OpenCL runtime", as mentioned in TFM.

yxrmz commented 2 years ago

Hi @gnudo . Did you make it work? Any comments/suggestions?

gnudo commented 2 years ago

Hi Roman,

thanks for the follow-up! Indeed, ZMQ works nicely as dicussed above. For instance, the script examples/withRaycing/01_SynchrotronSources/undulatorTapering_zmq.py achieves a 10x speed increase when calculated on a Tesla V100 (5100 cores) compared to my Quadro T600 (600 cores). However, at the moment I'm unable to achieve this speed boost with my other scripts, but I'll need to investigate that further. In case you have any suggestions regarding "sampling" (of the param space) and/or under which conditions one should encountered the best performance increase, please let me know. In any case, that's no ZMQ-issue anymore, as was the initial question of the issue

So, the only remaining suggestions/comments would then be:

to enable setting the ZMQ-address also in the GUI (xrtQook)
have a dedicated Chapter on ZMQ in the docs (a quicksearch on the website doesn't give any results on 'ZMQ', 'zeromq' etc., or at least I didn't find it)
and of course to have more FLOP-hungry functions (as you wrote above) support ZMQ (but I'm not even that far myself :-) )

Thanks for the support and all the best, Goran

PS: The installation of Intel OpenCL drivers I didn't pursue further due to the fact that all my calcs are on GPU.

kklmn / xrt

Zeromq example #92