kristinemlarson / gnssrefl

GNSS Interferometric Reflectometry Software (GNSS-IR)
GNU General Public License v3.0
154 stars 76 forks source link

Parallelizing gnssir #259

Closed aaryan-rampal closed 7 months ago

aaryan-rampal commented 7 months ago

Added force flags to gzip currently

Followup from https://github.com/kristinemlarson/gnssrefl/pull/245#issue-2150257231

kristinemlarson commented 7 months ago

@aaryan-rampal

i don't use the docker - but i thought i should check to see how the new gnssir code performs with -par 10. and it isn't nearly as useful as running it on my macOS. so that is something that should be looked into. presumably there is some setting in the Dockerfile to tell it to use more processors. it isn't a matter of the python code - cause that is working. @timdittmann may want to enter the discussion.

kristinemlarson commented 7 months ago

there are problems when the code tries to create files/directories at the same time. these are common files - not date specific files. i will see if i can fix it. not seeing this on my mac - only in the docker. as far as the cpu thing goes, it appears that you can set that when you run the docker.

timdittmann commented 7 months ago

@aaryan-rampal

i don't use the docker - but i thought i should check to see how the new gnssir code performs with -par 10. and it isn't nearly as useful as running it on my macOS. so that is something that should be looked into. presumably there is some setting in the Dockerfile to tell it to use more processors. it isn't a matter of the python code - cause that is working. @timdittmann may want to enter the discussion.

Hi @kristinemlarson @aaryan-rampal -- it is so cool to see this collaborative open source development. 🚀

As far as performance when using Docker, depending on how you have your docker desktop configured, you might not have access to the same number of cores at runtime in the docker? This is configurable.

Either in addition to or in place of explicit parallelism (par), what do you think about experimenting with gnssir performance when letting the os determine the number of processes? E.g. the default for multiprocessing is to spawn os.cpu_count() processes.

kristinemlarson commented 7 months ago

hi @timdittmann and @aaryan-rampal

i did a little testing yesterday - and i did find that configurable docker cpu setting under Resources, as well as where you set it when you issue the run command.

i was able to fix up the parts where multiple processes were trying to install the same file - so now that all happens before the processes are spawned. i will need to think about whether i inadvertently have the same problem in rinex2snr.

another bonus of this upgrade: i am completely going to eradicate all the ugly code in various modules (nmea2snr comes to mind) that uses multiple years and start years and whatever and use MJD for those loops.

k.

On Sat, Mar 16, 2024 at 3:53 AM Tim Dittmann @.***> wrote:

@aaryan-rampal https://github.com/aaryan-rampal

i don't use the docker - but i thought i should check to see how the new gnssir code performs with -par 10. and it isn't nearly as useful as running it on my macOS. so that is something that should be looked into. presumably there is some setting in the Dockerfile to tell it to use more processors. it isn't a matter of the python code - cause that is working. @timdittmann https://github.com/timdittmann may want to enter the discussion.

Hi @kristinemlarson https://github.com/kristinemlarson @aaryan-rampal https://github.com/aaryan-rampal -- it is so cool to see this collaborative open source development. 🚀

As far as performance when using Docker, depending on how you have your docker desktop configured, you might not have access to the same number of cores at runtime in the docker? This is configurable. https://docs.docker.com/desktop/settings/mac/#advanced

Either in addition to or in place of explicit parallelism (par https://github.com/kristinemlarson/gnssrefl/blob/bc11b4f8318576cbd53c78c81dd645254f5a0232/gnssrefl/gnssir_cl.py#L174), what do you think about experimenting with gnssir performance when letting the os determine the number of processes? E.g. the default for multiprocessing https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.pool is to spawn os.cpu_count() https://docs.python.org/3/library/os.html#os.cpu_count processes.

— Reply to this email directly, view it on GitHub https://github.com/kristinemlarson/gnssrefl/pull/259#issuecomment-2001295273, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFSDAPCMNEKTLI5LJSCPDN3YYOXZRAVCNFSM6AAAAABEUUUGHKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBRGI4TKMRXGM . You are receiving this because you were mentioned.Message ID: @.***>

-- Kristine M. Larson Professor Emerita, University of Bonn @.*** https://www.kristinelarson.net http://www.kristinelarson.net https://gnss-reflections.org https://github.com/kristinemlarson

Are you waiting for your code to be perfect?