Open clairem789 opened 1 month ago
Possibly related to multiprocessing options once more, but I do not find those options anymore. Have there been changes on this topic from v289? thanks
Yer these Killed: 9 can be anything... from a command like killall -9 python
or kill -9 {pid}
or from a memory issue or many others - its something "outside" python hence no python error and python is just killed.
As for the multiprocessing the options are still there (they were never there by default I believe)
# Define whether to use multiprocess "pool" or "process" or use "linear" mode
# when parallelising recipes
# dtype=string default=process
# options = pool, process
REPROCESS_MP_TYPE = process
# Define whether to use multiprocess "pool" or "process" or use "linear" mode
# when validating recipes
# dtype=string default=process
# options = linear, pool, process, pathos
REPROCESS_MP_TYPE_VAL = process
You'll have to check your old setup but I think the REPROCESS_MP_TYPE_VAL
on your machine had to be set to linear?
My past notes are not precise enough unfortunately. I thought that it worked with default options in the last version but I may be wrong. I restarted with pool option (val). If it fails again I'll try the linear.
I'm also moving to 290 and was wondering if REPROCESS_MP_TYPE_VAL shouldbe set to 'linear', as it's 'process' by default. 'process' seems to work for the minidata set, but for the full set of data ? I'll set the kw to linear.
I hed set it to linear since the beginning with the 288, as it failed with 'process'. what does 'pool' do ?
Status here: I tried all three options, and had a memory leak with all, even with : REPROCESS_MP_TYPE_VAL.value = 'linear' (in apero-drs/apero/core/instruments/spirou/default_constants.py) Unfortunately I don't know what to try next.
I launched apero_precheck after a fresh installation of the 290 and I can see the memory usage linearly increasing while APERO is updating the index db.
@Luc, this is with any of the options for REPROCESS_MP_TYPE_VAL.value?
linear But the memory increase above is before anything with the processing: it's during the index db update with apero_precheck. is it an issue with mySQL ? claire when did your crash occur: preprocessing ?
My crashes come at the validation process like you. It was the case before (v284) with the "process" option but I thought it had been fixed at the 288 version.
I had set it to linear since the beginning with the 288, as it failed with 'process'. What does 'pool' do ?
process and pool are just two different ways of multiprocessing: https://stackoverflow.com/questions/18176178/python-multiprocessing-process-or-pool-for-what-i-am-doing
Possibly related to multiprocessing options once more, but I do not find those options anymore. Have there been changes on this topic from v289?
There have been no changes related to this but its a complex web (one which I'm definitely simplifying for v0.8)
My crashes come at the validation process like you. It was the case before (v284) with the "process" option but I thought it had been fixed at the 288 version.
This was never "fixed" as I still never got to the bottom of what was causing it - the "linear" option used to fix it for you so that seemed "enough" to have that (slower) option.
@larnoldgithub and @clairem789 can you both try with v0.7.289 and v0.7.288 and verify that the problem comes from v0.7.290 I'll have to go through all changes line-by-line to see what changed that could possible affect it.
So I guess that the issue is not seen on UdM machines...? I could try to do this test with 288, doing: git checkout v0.7.288-stable-test replacing process by linear in the default_constants for VAL processing the complete run, and checking the memory usage during the first hours. Will keep you informed!
You never should be changing the default_constants
!!
Please use the user_config.ini
and user_constants.ini
files....
you can read the default_constants
and default_config
to look for constants to change but all changes should be added to the user_config.ini
or user_constants.ini
file (in your setup directory) - the values in user_xxx.ini will always overwrite default_xxx.py and also you wont be able to change branch if you modify the python files - so please don't do that!
i.e. add the following to user_constants.ini
# Define whether to use multiprocess "pool" or "process" or use "linear" mode
# when parallelising recipes
# dtype=string default=process
# options = pool, process
REPROCESS_MP_TYPE = process
# Define whether to use multiprocess "pool" or "process" or use "linear" mode
# when validating recipes
# dtype=string default=process
# options = linear, pool, process, pathos
REPROCESS_MP_TYPE_VAL = linear
So I guess that the issue is not seen on UdM machines...?
I haven't seen any such issues with NIRPS or SPIRou - though both machines have 300+GB of RAM.
NIRPS is only doing daily processing and I haven't done a large run for either NIRPS or SPIRou.
OK, sorry for my mistake in changing the wrong file. I'm doing this 2-3 times a year, not enough to remember all small details. It would be nice if it could explain it all, actually! The NW machine also has >300Gb but that's not enough. So at UdeM you haven't run the 290 version on the whole Spirou data set then?
Not a full reduction - the last runs have been done with v0.7.290 (though this is not as recommended as doing a full re-run) but again processing a single run may not show this issue as badly as redoing everything.
OK, sorry for my mistake in changing the wrong file. I'm doing this 2-3 times a year, not enough to remember all small details. It would be nice if it could explain it all, actually! The NW machine also has >300Gb but that's not enough. So at UdeM you haven't run the 290 version on the whole Spirou data set then?
I did the same error in the past... you should have somewhere a folder like .../config/myprofile/ where myprofile is the name of your 'installation' of apero, like offline290. in this folder you have a bunch of files: database.yaml install.sh install.yaml offline290.bash.setup offline290.sh.setup offline290.zsh.setup user_config.ini user_config.ini.org user_constants.ini user_constants.ini.org
the *.org are the original files I have cp just in case.
The way apero works is that it first reads the default and then updates the values with the user values, then starts the processing.
I did run the 288 with 'process' months ago, it crashed. I set it to 'linear' and has been very stable regarding the PROC at least? I didn't see any memory leak.
For my apero_precheck last night with the 290, the memory usuage increased linearly during the db update, then came back to the 'background level' of the machine. @njcuk9999 do you think this is expected behavior ? apero_precheck ended with no error.
I'll not be able to make a test with the 289 before next week.
So to close this issue, it was due to my mistake of modifying the REPROCESS_MP_TYPE_VAL option value in the wrong file (default instead of user's files). Incidentally I confirm that the linear option for this parameter is the one working for NewWorlds machine. Sorry for this!
Thanks for clearing this up, I'd rather this than trying to figure out what caused it to break in newer versions and not older ones!
After about 6-12h of processing, I found the terminal with a "Killed: 9" and all terminated. In google it says that the application has received a signal... Not sure what to do and when this will show up again. Any clue? Maybe a memory leak? I'll start ip again with an eye on the activity panel to check memory. thanks