i4Ds / STIXCore

STIX Core functionalities
BSD 3-Clause "New" or "Revised" License
3 stars 3 forks source link

Processing hung processing large TM requests #363

Closed samaloney closed 10 months ago

samaloney commented 11 months ago

A large TM request arrived and seems to have caused the pipeline to hang - I tried to restart and reprocess but not sure I deleted the correct files to trigger reprocessing.

No logs appear for a number of TM files and some of the files which have logs have no corresponding .out file

I downloaded the same TM and processed locally without issue which leads me to think it maybe related to either out-of-order packets, incomplete packets spanning multiple TM files or similar.


STIX_TM_16_BatchRequest.PktTmRaw.SOL.0.2023.214.07.24.03.306.kAZy@2023.283.16.00.01.900.15.xml.log
STIX_TM_16_BatchRequest.PktTmRaw.SOL.0.2023.214.07.24.03.306.kAZy@2023.283.16.00.01.900.14.xml.log
STIX_TM_16_BatchRequest.PktTmRaw.SOL.0.2023.214.07.24.03.306.kAZy@2023.283.16.00.01.900.13.xml.log
STIX_TM_16_BatchRequest.PktTmRaw.SOL.0.2023.214.07.24.03.306.kAZy@2023.283.16.00.01.900.12.xml.log
STIX_TM_16_BatchRequest.PktTmRaw.SOL.0.2023.214.07.24.03.306.kAZy@2023.283.16.00.01.900.11.xml.log
STIX_TM_16_BatchRequest.PktTmRaw.SOL.0.2023.214.07.24.03.306.kAZy@2023.283.16.00.01.900.10.xml.log
STIX_TM_16_BatchRequest.PktTmRaw.SOL.0.2023.214.07.24.03.306.kAZy@2023.283.16.00.01.900.9.xml.log
STIX_TM_16_BatchRequest.PktTmRaw.SOL.0.2023.214.07.24.03.306.kAZy@2023.283.16.00.01.900.3.xml.log
STIX_TM_16_BatchRequest.PktTmRaw.SOL.0.2023.214.07.24.03.307.duKw@2023.284.16.00.01.565.1.xml.log 
drhlxiao commented 11 months ago

If you upload the FITS files that you created locally to the NAS storage, they will be available on the data platform. What do you think of this temporary solution? @samaloney

samaloney commented 11 months ago

I manually ran the pipeline with the TM files from that entire batch it seems to have worked but not 100% sure will need to check on this in more detail.

stix-pipeline-cli -i tm-files-kAZy.txt where

tm-files-kAZy.txt

/data/stix/SOLSOC/from_edds/tm/incomming/STIX_TM_16_BatchRequest.PktTmRaw.SOL.0.2023.214.07.24.03.306.kAZy@2023.283.16.00.01.900.10.xml
/data/stix/SOLSOC/from_edds/tm/incomming/STIX_TM_16_BatchRequest.PktTmRaw.SOL.0.2023.214.07.24.03.306.kAZy@2023.283.16.00.01.900.11.xml
/data/stix/SOLSOC/from_edds/tm/incomming/STIX_TM_16_BatchRequest.PktTmRaw.SOL.0.2023.214.07.24.03.306.kAZy@2023.283.16.00.01.900.12.xml
/data/stix/SOLSOC/from_edds/tm/incomming/STIX_TM_16_BatchRequest.PktTmRaw.SOL.0.2023.214.07.24.03.306.kAZy@2023.283.16.00.01.900.13.xml
/data/stix/SOLSOC/from_edds/tm/incomming/STIX_TM_16_BatchRequest.PktTmRaw.SOL.0.2023.214.07.24.03.306.kAZy@2023.283.16.00.01.900.14.xml
/data/stix/SOLSOC/from_edds/tm/incomming/STIX_TM_16_BatchRequest.PktTmRaw.SOL.0.2023.214.07.24.03.306.kAZy@2023.283.16.00.01.900.15.xml
/data/stix/SOLSOC/from_edds/tm/incomming/STIX_TM_16_BatchRequest.PktTmRaw.SOL.0.2023.214.07.24.03.306.kAZy@2023.283.16.00.01.900.1.xml
/data/stix/SOLSOC/from_edds/tm/incomming/STIX_TM_16_BatchRequest.PktTmRaw.SOL.0.2023.214.07.24.03.306.kAZy@2023.283.16.00.01.900.2.xml
/data/stix/SOLSOC/from_edds/tm/incomming/STIX_TM_16_BatchRequest.PktTmRaw.SOL.0.2023.214.07.24.03.306.kAZy@2023.283.16.00.01.900.3.xml
/data/stix/SOLSOC/from_edds/tm/incomming/STIX_TM_16_BatchRequest.PktTmRaw.SOL.0.2023.214.07.24.03.306.kAZy@2023.283.16.00.01.900.4.xml
/data/stix/SOLSOC/from_edds/tm/incomming/STIX_TM_16_BatchRequest.PktTmRaw.SOL.0.2023.214.07.24.03.306.kAZy@2023.283.16.00.01.900.5.xml
/data/stix/SOLSOC/from_edds/tm/incomming/STIX_TM_16_BatchRequest.PktTmRaw.SOL.0.2023.214.07.24.03.306.kAZy@2023.283.16.00.01.900.6.xml
/data/stix/SOLSOC/from_edds/tm/incomming/STIX_TM_16_BatchRequest.PktTmRaw.SOL.0.2023.214.07.24.03.306.kAZy@2023.283.16.00.01.900.7.xml
/data/stix/SOLSOC/from_edds/tm/incomming/STIX_TM_16_BatchRequest.PktTmRaw.SOL.0.2023.214.07.24.03.306.kAZy@2023.283.16.00.01.900.8.xml
/data/stix/SOLSOC/from_edds/tm/incomming/STIX_TM_16_BatchRequest.PktTmRaw.SOL.0.2023.214.07.24.03.306.kAZy@2023.283.16.00.01.900.9.xml
samaloney commented 11 months ago

Seems most of the files are there not but locally I have these files from one request but they are not on there server.

./LB/21/6/21/solo_LB_stix-21-6-21_0000000000-9999999999_V01_2309062070-64465.fits
./L0/21/6/21/solo_L0_stix-sci-xray-cpd_0747346648-0747347258_V01_2309062070-64465.fits
./L1/2023/09/06/SCI/solo_L1_stix-sci-xray-cpd_20230906T202040-20230906T203051_V01_2309062070-64465.fits
nicHoch commented 10 months ago

this dedicated case was reprocessed and therefore "solved" the main issue of pipeline hung ist still open