njcuk9999 / apero-drs

A PipelinE to Reduce Observations - The DRS for SPIRou (CFHT)
MIT License
12 stars 0 forks source link

Index for table 'findex_full_v07288_db' is corrupt #751

Open andrescarmona opened 5 months ago

andrescarmona commented 5 months ago

Still problems with the database file index is corrupt.

Some ideas how to repair the data base?

Cheers!

11:22:53.422-**|DatabaseError|E[00-002-00032] <class 'mysql.connector.errors.DatabaseError'>: 1034 (HY000): Index for table 'findex_full_v07288_db' is corrupt; try to repair it Command: UPDATE findex_full_v07288_db SET BLOCK_KIND = "raw", ABSPATH = "/net/GSP/nas12c/big_spirou/APERO_v07288/full_v07288/raw/2024-03-03/2993864o.fits", OBS_DIR = "2024-03-03", FILENAME = "2993864o.fits", RAWFIX = "1", KW_DATE_OBS = "2024-03-03", KW_UTC_OBS = "9:23:45.48", KW_ACQTIME = 60372.3919286, KW_TARGET_TYPE = "TARGET", KW_MID_OBS_TIME = 60372.39179961852, KW_OBJNAME = "GL410", KW_OBJECTNAME = "Gl410", KW_OBJECTNAME2 = "Gl410", KW_OBSTYPE = "OBJECT", KW_EXPTIME = 22.288, KW_INSTRUMENT = "SPIRou", KW_CCAS = "pos_pk", KW_CREF = "pos_fp", KW_CDEN = "2.27", KW_CALIBWH = "P5", KW_POLAR_KEY_1 = "P2", KW_POLAR_KEY_2 = "P4", KW_DPRTYPE = "POLAR_FP", KW_DRS_MODE = "POLAR", KW_OUTPUT = "RAW_POLAR_FP", KW_CMPLTEXP = "3", KW_NEXP = "4", KW_VERSION = NULL, KW_PPVERSION = NULL, KW_DRS_DATE_NOW = NULL, KW_PI_NAME = "Etienne Artigau", KW_RUN_ID = "24AC25", KW_PID = NULL, KW_FIBER = NULL, KW_IDENTIFIER = "2993864o" WHERE BLOCK_KIND="raw" AND OBS_DIR="2024-03-03" AND FILENAME="2993864o.fits"

    path: /home/external/acarmona/.apero/acarmona_at_clusterdb
    Function: apero.base.drs_db.py.Database.execute()
njcuk9999 commented 5 months ago

Unfortunately not really short of using apero to delete it and re-populate it. This would take some time but the command is:

apero_database.py --update --dbkind=findex

If its corrupt this may not even be able to update, if that is the case I would suggest trying

apero_database.py --reset --dbkind=reset

If that fails you could create a "mini data set" and then via mysql drop the database that is corrupted and duplicate the one from a "mini data set" (again via mysql) you should then be able to run:

apero_database.py --update --dbkind=findex

All of these options might take quite some time, maybe someone with more SQL experience could find another option to just remove the bad rows or something (again via mysql?)?

larnoldgithub commented 5 months ago

I had to do that exact command yesterday apero_database.py --update --dbkind=findex because I got issues after a CTRL C I did while Apero just started to udpate the db at the beginning of a test with aprero_processing.py *.ini --test=True It took 7h last night but it worked !

andrescarmona commented 5 months ago

Excellent! My is running, keep you posted!

andrescarmona commented 5 months ago

Ok i got again a data-base error after

apero_database.py --update --dbkind=findex

Had a same problem with log database, I am doing now the reset and check!

The error


18:08:41.712-!!|DBMGR| 18:08:41.701-|DatabaseError|E[00-002-00032] <class 18:08:41.712-!!|DBMGR| 'mysql.connector.errors.DatabaseError'>: 1034 (HY000): Index for table 18:08:41.712-!!|DBMGR| 'findex_full_v07288_db' is corrupt; try to repair it 18:08:41.712-!!|DBMGR| Command: INSERT INTO findex_full_v07288_db(ABSPATH, OBS_DIR, FILENAME, 18:08:41.712-!!|DBMGR| BLOCK_KIND, LAST_MODIFIED, RECIPE, RUNSTRING, INFILES, KW_DATE_OBS, KW_UTC_OBS, 18:08:41.712-!!|DBMGR| KW_ACQTIME, KW_TARGET_TYPE, KW_MID_OBS_TIME, KW_OBJNAME, KW_OBJECTNAME, 18:08:41.712-!!|DBMGR| KW_OBJECTNAME2, KW_OBSTYPE, KW_EXPTIME, KW_INSTRUMENT, KW_CCAS, KW_CREF, 18:08:41.712-!!|DBMGR| KW_CDEN, KW_CALIBWH, KW_POLAR_KEY_1, KW_POLAR_KEY_2, KW_DPRTYPE, KW_DRS_MODE, 18:08:41.712-!!|DBMGR| KW_OUTPUT, KW_CMPLTEXP, KW_NEXP, KW_VERSION, KW_PPVERSION, KW_DRS_DATE_NOW, 18:08:41.712-!!|DBMGR| KW_PI_NAME, KW_RUN_ID, KW_PID, KW_FIBER, KW_IDENTIFIER, USED, RAWFIX, UHASH) 18:08:41.712-!!|DBMGR| VALUES("/net/GSP/nas12c/big_spirou/APERO_v07288/full_v07288/lbl/log/LOG-2023-10-25-lbl_compile.log", 18:08:41.712-!!|DBMGR| "log", "LOG-2023-10-25-lbl_compile.log", "lbl", 1698243378.0671945, "Unknown", 18:08:41.712-!!|DBMGR| NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 18:08:41.712-!!|DBMGR| NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 18:08:41.712-!!|DBMGR| NULL, NULL, NULL, NULL, NULL, NULL, 1, 1, 18:08:41.712-!!|DBMGR| "050d43e3c5382519e84527c0cd046a2762950272bc73ee8e686fa0da5743692a") 18:08:41.713-!!|DBMGR| path: /home/external/acarmona/.apero/acarmona_at_clusterdb 18:08:41.714-!!|DBMGR| Function: apero.base.drs_db.py.Database.execute() 18:08:41.759-|DBMGR| * 18:08:41.768-@!|DBMGR| W[40-003-00005]: Recipe apero_database has NOT been successfully completed 18:08:41.776-*|DBMGR| ;****

andrescarmona commented 5 months ago

I solved, for the moment, the issue, openning the database in MYSQL, and executing the SQL command that gives the error in apero. It magically worked out inside SQL, and then lauched

apero_database.py --update --dbkind=findex

again, it seems to be working

njcuk9999 commented 5 months ago

okay. Note for next time you can do:

apero_database.py --update --dbkind=all

and just update all the databases (after a long wait)

andrescarmona commented 5 months ago

I have the problem still.

The issues is when APERO send the command, we get a database error. When I launch the command directly sur MariaDB there is no problem (see below)

Are we sending too many commands in parallel?

It would be a possibility to say to apero to wait 1s and send the command again if it gets a database error?

Cheers!

Andres

apero.base.drs_db.DatabaseError: Database Error: 16:21:34.429-**|DatabaseError|E[00-002-00032] <class 'mysql.connector.errors.DatabaseError'>: 1034 (HY000): Index for table 'findex_minidata2_v07289_db' is corrupt; try to repair it Command: UPDATE findex_minidata2_v07289_db SET BLOCK_KIND = "raw", ABSPATH = "/net/GSP/nas12c/big_spirou/minidata2_raw/2020-11-01/2539960o.fits", OBS_DIR = "2020-11-01", FILENAME = "2539960o.fits", RAWFIX = "1", KW_DATE_OBS = "2020-11-01", KW_UTC_OBS = "8:23:51.96", KW_ACQTIME = 59154.3509759, KW_TARGET_TYPE = "TARGET", KW_MID_OBS_TIME = 59154.3505244706, KW_OBJNAME = "GJ905", KW_OBJECTNAME = "Gl905", KW_OBJECTNAME2 = "Gl905", KW_OBSTYPE = "OBJECT", KW_EXPTIME = 78.007, KW_INSTRUMENT = "SPIRou", KW_CCAS = "pos_pk", KW_CREF = "pos_fp", KW_CDEN = "2.45", KW_CALIBWH = "P5", KW_POLAR_KEY_1 = "P2", KW_POLAR_KEY_2 = "P16", KW_DPRTYPE = "POLAR_FP", KW_DRS_MODE = "POLAR", KW_OUTPUT = "RAW_POLAR_FP", KW_CMPLTEXP = "2", KW_NEXP = "4", KW_VERSION = NULL, KW_PPVERSION = NULL, KW_DRS_DATE_NOW = NULL, KW_PI_NAME = "Jean-Francois Donati", KW_RUN_ID = "20BP40", KW_PID = NULL, KW_FIBER = NULL, KW_IDENTIFIER = "2539960o" WHERE BLOCK_KIND="raw" AND OBS_DIR="2020-11-01" AND FILENAME="2539960o.fits" path: /home/external/acarmona/.apero/acarmona_at_clusterdb Function: apero.base.drs_db.py.Database.execute()

Now if I send that command directly in MariaDB

MariaDB [aperodb]> UPDATE findex_minidata2_v07289_db SET BLOCK_KIND = "raw", ABSPATH = "/net/GSP/nas12c/big_spirou/minidata2_raw/2020-11-01/2539960o.fits", OBS_DIR = "2020-11-01", FILENAME = "2539960o.fits", RAWFIX = "1", KW_DATE_OBS = "2020-11-01", KW_UTC_OBS = "8:23:51.96", KW_ACQTIME = 59154.3509759, KW_TARGET_TYPE = "TARGET", KW_MID_OBS_TIME = 59154.3505244706, KW_OBJNAME = "GJ905", KW_OBJECTNAME = "Gl905", KW_OBJECTNAME2 = "Gl905", KW_OBSTYPE = "OBJECT", KW_EXPTIME = 78.007, KW_INSTRUMENT = "SPIRou", KW_CCAS = "pos_pk", KW_CREF = "pos_fp", KW_CDEN = "2.45", KW_CALIBWH = "P5", KW_POLAR_KEY_1 = "P2", KW_POLAR_KEY_2 = "P16", KW_DPRTYPE = "POLAR_FP", KW_DRS_MODE = "POLAR", KW_OUTPUT = "RAW_POLAR_FP", KW_CMPLTEXP = "2", KW_NEXP = "4", KW_VERSION = NULL, KW_PPVERSION = NULL, KW_DRS_DATE_NOW = NULL, KW_PI_NAME = "Jean-Francois Donati", KW_RUN_ID = "20BP40", KW_PID = NULL, KW_FIBER = NULL, KW_IDENTIFIER = "2539960o" WHERE BLOCK_KIND="raw" AND OBS_DIR="2020-11-01" AND FILENAME="2539960o.fits"; Query OK, 1 row affected (0.002 sec) Rows matched: 1 Changed: 1 Warnings: 0

andrescarmona commented 5 months ago

Ok catching up.

I run apero_processing.py mini_run2.ini , it breaks. I run it again and it breaks.

BUT the file where the error happens changes. Example

run1: FILENAME = "2516455c.fits", RAWFIX = "1", KW_DATE_OBS = "2020-09-23" run2: FILENAME = "2516427o.fits", RAWFIX = "1", KW_DATE_OBS = "2020-09-23" run3: FILENAME = "2510486a.fits", RAWFIX = "1", KW_DATE_OBS = "2020-08-31"

I checked the status of the database after each run and it is MariaDB [aperodb]> check table findex_minidata2_v07289_db; +------------------------------------+-------+----------+----------+ | Table | Op | Msg_type | Msg_text | +------------------------------------+-------+----------+----------+ | aperodb.findex_minidata2_v07289_db | check | status | OK | +------------------------------------+-------+----------+----------+

so no problems.

Perhaps we are sending too many connections to the database at the same time...

andrescarmona commented 5 months ago

BTW I do

apero_database.py --update --dbkind=all

and there is no errors!!!

but I launch apero_processing.py mini_run2.ini and it breaks when updating the data-base...

andrescarmona commented 5 months ago

Update: after launched ~30 times apero_processing.py mini_run2.ini after each other I manage to pass the "actaulizing" the header in the data-base part and now it processing the data !!

njcuk9999 commented 5 months ago

The "too many connections" may be an issue but it doesn't explain why everything worked for you before (and on a full run where you are doing hundreds of times more than the mini data).

Thus I think something is bad with your database - nothing has changed with v0.7.289 to be sure you can test with v0.7.288 (which worked for you previously with the same amount of data and the same number of cores).

andrescarmona commented 5 months ago

I am reducing the missing nights with the v07288 after having done update all the databases.

I will do a test with the v07288 and the minidata2 ;-)

andrescarmona commented 5 months ago

I launched the reduction of the missing nights with v07288, same database error, it breaks in the "Updating database with header fixes" .

Command: UPDATE findex_full_v07288_db SET BLOCK_KIND = "raw", ABSPATH = "/net/GSP/nas12c/big_spirou/APERO_v07288/full_v07288/raw/2024-03-03/2993849o.fits", OBS_DIR = "2024-03-03", FILENAME = "2993849o.fits", RAWFIX = "1", KW_DATE_OBS = "2024-03-03", KW_UTC_OBS = "6:30:21.91", KW_ACQTIME = 60372.2728508, KW_TARGET_TYPE = "TARGET", KW_MID_OBS_TIME = 60372.27204467732, KW_OBJNAME = "GJ3378", KW_OBJECTNAME = "GJ3378", KW_OBJECTNAME2 = "GJ3378", KW_OBSTYPE = "OBJECT", KW_EXPTIME = 139.298, KW_INSTRUMENT = "SPIRou", KW_CCAS = "pos_pk", KW_CREF = "pos_fp", KW_CDEN = "2.81", KW_CALIBWH = "P5", KW_POLAR_KEY_1 = "P14", KW_POLAR_KEY_2 = "P16", KW_DPRTYPE = "POLAR_FP", KW_DRS_MODE = "POLAR", KW_OUTPUT = "RAW_POLAR_FP", KW_CMPLTEXP = "1", KW_NEXP = "4", KW_VERSION = NULL, KW_PPVERSION = NULL, KW_DRS_DATE_NOW = NULL, KW_PI_NAME = "Etienne Artigau", KW_RUN_ID = "24AC25", KW_PID = NULL, KW_FIBER = NULL, KW_IDENTIFIER = "2993849o" WHERE BLOCK_KIND="raw" AND OBS_DIR="2024-03-03" AND FILENAME="2993849o.fits"

andrescarmona commented 5 months ago

The guys at LAM created a completely new database, let us see this works. When installing the new profile python asked me to actualize the SQL connector too.

andrescarmona commented 5 months ago

I launched : apero_database.py --update --dbkind=all

njcuk9999 commented 5 months ago

So this new apero profile is linked to the old data set or a new data set? If it is a new data set you aren't required to do apero_database.py --update (as there is nothing to update yet)

andrescarmona commented 5 months ago

Ok cool, there is one linked to the old dataset and there is one linked to a new minidataset2 But installing the one of the minidataset2 in the new aperodb2 I got again the same error 11:20:57.551-!!|RESET| 11:20:57.542-|DatabaseError|E[00-002-00032] <class 11:20:57.551-!!|RESET| 'mysql.connector.errors.DatabaseError'>: 1034 (HY000): Index for table 11:20:57.551-!!|RESET| 'lang_minidata2_v07288_db2_db' is corrupt; try to repair it 11:20:57.552-!!|RESET| Command: INSERT INTO lang_minidata2_v07288_db2_db VALUES("DARKFILE_HELP", 11:20:57.552-!!|RESET| "HELP", NULL, NULL, "[STRING] The Dark file to use (CALIBDB=DARKM)", "nan") 11:20:57.553-!!|RESET| path: /home/external/acarmona/.apero/acarmona_at_clusterdb 11:20:57.553-!!|RESET| Function: apero.base.drs_db.py.Database.execute() 11:20:57.565-|RESET| * 11:20:57.574-@!|RESET| W[40-003-00005]: Recipe apero_reset has NOT been successfully completed 11:20:57.583-*|RESET|

andrescarmona commented 5 months ago

It was a fresh installation using v07288

andrescarmona commented 5 months ago

Do you think is there would be way of telling APERO to send at least five times the SQL command (with 10 s interval), before crashing?

njcuk9999 commented 5 months ago

It would be quite a big re-write of lots of stuff... for something that doesn't fix the problem (just hides it) and wasn't a problem before with the same code... I'm not sure that is the best approach