njcuk9999 / apero-drs

A PipelinE to Reduce Observations - The DRS for SPIRou (CFHT)
MIT License
12 stars 0 forks source link

Error after EXTALL: mysql.connector.errors.DatabaseError (0.7.286) #717

Open larnoldgithub opened 1 year ago

larnoldgithub commented 1 year ago

I got errors like this one:

15:43:45.513- |PROC| 15:43:45.530-@!|PROC| W[40-503-00019]: Error found for ID='56325' 15:43:45.547-@!|PROC| apero_extract_spirou.py 18BQ24-Jan17 2368975o_pp.fits --crunfile=run_EXTALL_only.ini --program=EXTALL[56325] --recipe_kind=extract-ALL --shortname=EXTALL --parallel=True 15:43:45.563- |PROC| 15:43:45.580- |PROC| 14:10:50.862-**|DatabaseError|E[00-002-00032] <class 'mysql.connector.errors.DatabaseError'>: 1205 (HY000): Lock wait timeout exceeded; try restarting transaction Command: INSERT INTO log_offline_db VALUES("apero_extract_spirou", "EXTALL", "red", "recipe", "extract-ALL", "EXTALL[56325]", "PID-00016940090554429720-X01Q", "2023-09-06 14:04:15.443", "1694009055.443", "APEROG-PID-00016934391638252470-QNB4_apero_processing_group", 2, 0, "num=0 fiber=C ", "/data/spirou4/apero-data/offline/tmp", "/data/spirou4/apero-data/offline/reduced", "18BQ24-Jan17", "/data/spirou4/apero-data/offline/msg/processing/APEROG-PID-00016934391638252470-QNB4_apero_processing_group/18BQ24-Jan17/APEROL-PID-00016940090554429720-X01Q_apero_extract_spirou.log", "/data/spirou4/apero-data/offline/plot/18BQ24-Jan17/pid-00016940090554429720-x01q_apero_extract_spirou_1", "apero_extract_spirou.py 18BQ24-Jan17 2368975o_pp.fits --quicklook=False --badcorr=True --backsub=True --combine=False --combine_method=sum --darkcorr=True --fiber=ALL --flipimage=both --fluxunits=e- --plot=0 --resize=True --leakcorr=True --thermal=True --force_ref_wave=False --no_in_qc=False --program=EXTALL[56325] --recipe_kind=extract-ALL --parallel=True --shortname=EXTALL --crunfile=run_EXTALL_only.ini --nosave=False ", "obs_dir=18BQ24-Jan17 || files[1]=/data/spirou4/apero-data/offline/tmp/18BQ24-Jan17/2368975o_pp.fits [DRS_PP]", "--quicklook=False || --badcorr=True || --backsub=True || --combine=False || --combine_method=sum || --objname=None || --dprtype=None || --darkcorr=True || --fiber=ALL || --flipimage=both || --fluxunits=e- || --plot=0 || --resize=True || --leakcorr=True || --thermal=True || --force_ref_wave=False || --no_in_qc=False", "--xhelp=None || --debug=None || --listing=None || --listingall=None || --version=None || --info=None || --program=EXTALL[56325] || --recipe_kind=extract-ALL || --parallel=True || --shortname=EXTALL || --idebug=None || --ref=None || --crunfile=run_EXTALL_only.ini || --quiet=None || --nosave=False || --force_indir=None || --force_outdir=None", "2023-09-06 14:04:15.443", NULL, 1, 1, "image=NaN [image is all NaN] PASSED ||", "image||", "NaN||", "image is all NaN||", "1||", NULL, 0, 67, "IN_PARALLEL|RUNNING|ENDED|FORCE_REFWAVE|USER_FIBERS|QUICKLOOK|EXP_FPLINE", 1, "10.089717864990234", "-1.0", "98.2339096069336", "0.0", "-1.0", "0.0", "53.1", "-1.0", 28, "2023-09-06 14:05:36.351", NULL) path: /h/spdrs/.apero/apero_at_localhost Function: apero.base.drs_db.py.Database.execute() 15:43:45.889- |PROC| 15:43:45.906- |PROC| 15:43:45.922- |PROC| ***

njcuk9999 commented 1 year ago

Okay so this is a problem with my implementation of MySQL (I think) in v0.8 I'm going to completely overhaul the way we interface with MySQL to fix this.

There are a few causes for this that might help prevent this error

1) Use less cores

This is the last approach you should try - if nothing else is working

2) Increase the number of maximum connections

In MySQL type:

>>> SHOW variables LIKE "max_connections";

Mine reads:

+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| max_connections | 768   |
+-----------------+-------+

As for changing it I had to get our admin to change the value. The steps are here

I think the default is 101? Also, don't go too high apparently it can have problems (768 seems to work on our machines). Maybe @cusher can comment on this.

larnoldgithub commented 1 year ago

Thanks Neil. I don't have the credentials to access MySQL stuff, I hope @cusher can do it asap.

btw 768 looks like you had to calculate something and ended with 768 ? where is this specific number coming from ?

njcuk9999 commented 1 year ago

It wasn't me it was our admin, I guess it is just a half-square number i.e. 256, 512, 768, 1024 etc

larnoldgithub commented 1 year ago

the default is 151 according to the doc you mention.

larnoldgithub commented 10 months ago

I got the same issue again with the .288; i didn't remember to ask Chris to update that parameter for MySQL. I reprocessed the 67 files (of about 60000 o.fits) for which extraction failed.