mundialis / sadasadam

SAme DAy SAtellite DAta Mosaics
GNU General Public License v3.0
3 stars 1 forks source link

Error in post processing #5

Closed lucadelu closed 6 months ago

lucadelu commented 10 months ago
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 0s Left: 11 AVG: 0.00s  local:1/0/100%/0.0s gsl: simplex2.c:372: ERROR: non-finite function value encountered
Default GSL error handler invoked.
/home/docker/bin/force-l2ps_: line 161: 190688 Aborted                 $EXE $TODO $PRM >> $LOGFILE
ETA: 679s Left: 10 AVG: 78.00s  local:1/1/100%/204.0s gsl: simplex2.c:372: ERROR: non-finite function value encountered
Default GSL error handler invoked.
/home/docker/bin/force-l2ps_: line 161: 191319 Aborted                 $EXE $TODO $PRM >> $LOGFILE
ETA: 493s Left: 9 AVG: 39.00s  local:1/2/100%/102.0s Couldn't open the directory: Not a directory
ETA: 240s Left: 8 AVG: 29.67s  local:1/3/100%/71.7s gsl: simplex2.c:372: ERROR: non-finite function value encountered
Default GSL error handler invoked.
/home/docker/bin/force-l2ps_: line 161: 191860 Aborted                 $EXE $TODO $PRM >> $LOGFILE
ETA: 300s Left: 7 AVG: 43.25s  local:1/4/100%/74.8s gsl: simplex2.c:372: ERROR: non-finite function value encountered
Default GSL error handler invoked.
/home/docker/bin/force-l2ps_: line 161: 192113 Aborted                 $EXE $TODO $PRM >> $LOGFILE
ETA: 284s Left: 6 AVG: 47.60s  local:1/5/100%/72.8s gsl: simplex2.c:372: ERROR: non-finite function value encountered
Default GSL error handler invoked.
/home/docker/bin/force-l2ps_: line 161: 192640 Aborted                 $EXE $TODO $PRM >> $LOGFILE
ETA: 303s Left: 5 AVG: 60.83s  local:1/6/100%/81.8s gsl: simplex2.c:372: ERROR: non-finite function value encountered
Default GSL error handler invoked.
/home/docker/bin/force-l2ps_: line 161: 193163 Aborted                 $EXE $TODO $PRM >> $LOGFILE
ETA: 247s Left: 4 AVG: 62.00s  local:1/7/100%/80.0s gsl: simplex2.c:372: ERROR: non-finite function value encountered
Default GSL error handler invoked.
/home/docker/bin/force-l2ps_: line 161: 193895 Aborted                 $EXE $TODO $PRM >> $LOGFILE
ETA: 210s Left: 3 AVG: 70.12s  local:1/8/100%/85.9s gsl: simplex2.c:372: ERROR: non-finite function value encountered
Default GSL error handler invoked.
/home/docker/bin/force-l2ps_: line 161: 194379 Aborted                 $EXE $TODO $PRM >> $LOGFILE
ETA: 154s Left: 2 AVG: 77.33s  local:1/9/100%/91.3s gsl: simplex2.c:372: ERROR: non-finite function value encountered
Default GSL error handler invoked.
/home/docker/bin/force-l2ps_: line 161: 196089 Aborted                 $EXE $TODO $PRM >> $LOGFILE
ETA: 81s Left: 1 AVG: 81.40s  local:1/10/100%/94.0s gsl: simplex2.c:372: ERROR: non-finite function value encountered
Default GSL error handler invoked.
/home/docker/bin/force-l2ps_: line 161: 196757 Aborted                 $EXE $TODO $PRM >> $LOGFILE
ETA: 0s Left: 0 AVG: 74.00s  local:0/11/100%/85.5s 
FORCE log files copied to /tmp/sadasadam/output
Creation of same day mosaics...
Creation of same day mosaics finished
Postprocessing to clear sky mosaics...
/usr/local/lib/python3.8/dist-packages/sadasadam/force.py:47: UserWarning: Directory /tmp/sadasadam/output already exists, skipping...
  warnings.warn(f"Directory {directory} already exists, skipping...")
Traceback (most recent call last):
  File "/usr/local/bin/sadasadam", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/sadasadam/cli.py", line 256, in main
    force_proc.postprocess(
  File "/usr/local/lib/python3.8/dist-packages/sadasadam/force.py", line 365, in postprocess
    raise Exception(
Exception: No files found in FORCE mosaic directory /tmp/sadasadam/temp_force/force_dir_20231114_135159/level2/mosaic. Check FORCE logs in /tmp/sadasadam/output for details.

the only error I found is in one of the log files

/tmp/sadasadam/download/S2A_MSIL1C_20230811T101031_N0509_R022_T32TQS_20230811T135155.SAFE: Unable to dive down .SAFE file!
Parsing parameter file failed.

it seems the archive is corrupt

lucadelu@homegis:~$ file /tmp/sadasadam/download/S2A_MSIL1C_20230811T101031_N0509_R022_T32TQS_20230811T135155.SAFE
/tmp/sadasadam/download/S2A_MSIL1C_20230811T101031_N0509_R022_T32TQS_20230811T135155.SAFE: Zip archive data, at least v2.0 to extract, compression method=store
lucadelu@homegis:~$ unzip -l /tmp/sadasadam/download/S2A_MSIL1C_20230811T101031_N0509_R022_T32TQS_20230811T135155.SAFE
Archive:  /tmp/sadasadam/download/S2A_MSIL1C_20230811T101031_N0509_R022_T32TQS_20230811T135155.SAFE
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of /tmp/sadasadam/download/S2A_MSIL1C_20230811T101031_N0509_R022_T32TQS_20230811T135155.SAFE or
        /tmp/sadasadam/download/S2A_MSIL1C_20230811T101031_N0509_R022_T32TQS_20230811T135155.SAFE.zip, and cannot find /tmp/sadasadam/download/S2A_MSIL1C_20230811T101031_N0509_R022_T32TQS_20230811T135155.SAFE.ZIP, period.

Could be possible to add a check if the archive is valid otherwise download it again?

lucadelu commented 10 months ago

Failing to download it happen quite often

Found 39 matching scenes of type S2_MSI_L1C, starting download...
Downloaded products:   0%|                                                                                                                                                             | 0/20 [00:00<?, ?product/s]
S2A_MSIL1C_20221204T101411_N0400_R022_T32TPR_20221204T122044:  38%|██████████████████████████████████████████                                                                    | 250M/654M [02:58<09:19, 722kB/s]
A problem occurred during download of product: EOProduct(id=S2A_MSIL1C_20221204T101411_N0400_R022_T32TPR_20221204T122044, provider=cop_dataspace). Skipping it
S2A_MSIL1C_20221204T101411_N0400_R022_T32TPR_20221204T122044:  38%|█████████████████████████████████████████▋                                                                   | 250M/654M [03:46<06:05, 1.10MB/s]
S2B_MSIL1C_20221209T101319_N0509_R022_T32TPR_20221209T104630: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 549.86file/s]
Downloaded products:   5%|███████▎                                                                                                                                         | 1/20 [17:03<5:24:06, 1023.53s/product
A problem occurred during download of product: EOProduct(id=S2B_MSIL1C_20221219T101339_N0509_R022_T32TPR_20221219T104633, provider=cop_dataspace). Skipping it                   | 443M/650M [06:11<03:03, 1.13MB/s]
S2B_MSIL1C_20221219T101339_N0509_R022_T32TPR_20221219T104633:  68%|██████████████████████████████████████████████████████████████████████████▏                                  | 443M/650M [06:11<02:54, 1.19MB/s]
A problem occurred during download of product: EOProduct(id=S2A_MSIL1C_20221224T101431_N0509_R022_T32TPR_20221224T121037, provider=cop_dataspace). Skipping it                  | 85.7M/815M [00:27<05:49, 2.08MB/s]
S2A_MSIL1C_20221224T101431_N0509_R022_T32TPR_20221224T121037:  11%|███████████▍                                                                                                | 85.9M/815M [00:27<03:53, 3.11MB/s]
Downloaded products:   5%|███████▎      
griembauer commented 10 months ago

Hi @lucadelu, I think these are two different things:

lucadelu commented 10 months ago

@griembauer

I was think that error in downloading was one of the causes of the other problem

griembauer commented 9 months ago

I am having trouble trying to reproduce this issue, S2A_MSIL1C_20230811T101031_N0509_R022_T32TQS_20230811T135155.SAFE seems fine when I download it with sadasadam. Could you make the corrupt file available to me (email/download link/...) so I can debug? (EDIT: I think I came across the issue you are referring to: sometimes a download is a corrupted .SAFE file and not a .zip in which case download should be retried) In any case we can implement a generic test that checks zip files via zipfile.testzip().

More generally, please make sure that

However I am not sure whether the error you posted initially is related to the one scene being corrupt. Usually FORCE simply ignores and skips scenes it cannot handle and uses all other available valid scenes. In your case however the process aborts entirely. This should only happen if the corrupt file S2A_MSIL1C_20230811T101031_N0509_R022_T32TQS_20230811T135155.SAFE really is the only scene that was downloaded but I assume that is not the case?

griembauer commented 9 months ago

The issue of corrupt files should be solved via https://github.com/mundialis/sadasadam/pull/10

lucadelu commented 9 months ago

Ok thanks a lot, I'm going to test it today

lucadelu commented 9 months ago

Hi @griembauer

I'm doing some other tests and I found other problems, following some questions:

I can attach the config file if needed, the docker file of FORCE is using GDAL 3.0.4 and last FORCE version (3.7.12)

thanks a lot

S2B_MSIL1C_20221202T102309_N0400_R065_T32TQS_20221202T105926: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1045.96file/s]
S2A_MSIL1C_20221204T101411_N0400_R022_T32TPS_20221204T122044: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 501.11file/s]
S2B_MSIL1C_20221202T102309_N0400_R065_T32TPR_20221202T105926: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 868.57file/s]
S2A_MSIL1C_20221204T101411_N0400_R022_T32TQR_20221204T122044: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 454.62file/s]
S2B_MSIL1C_20221202T102309_N0400_R065_T32TPS_20221202T105926: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1100.29file/s]
S2A_MSIL1C_20221204T101411_N0400_R022_T32TQS_20221204T122044: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 403.22file/s]
S2A_MSIL1C_20221204T101411_N0400_R022_T32TPR_20221204T122044: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 611.41file/s]
Downloaded products: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [11:00<00:00, 94.35s/product]
Product type 'LANDSAT_C2L1' is not available with provider 'cop_dataspace'. Searching it on provider 'usgs' instead.
Found 3 matching scenes of type LANDSAT_C2L1, starting download...
LC08_L1GT_192028_20221202_20221212_02_T2: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 619.54file/s]
LC09_L1TP_193028_20221201_20230319_02_T1: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 738.17file/s]
LC09_L1TP_193027_20221201_20230319_02_T1: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 589.75file/s]
Downloaded products: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [03:09<00:00, 63.11s/product]
Setting up FORCE processing...
Downloading Water Vapor Database to /scratch3/delucchil/sadasadam/wvdb/...
Extracting Water Vapor Database...
Water Vapor Database downloaded and extracted to /scratch3/delucchil/sadasadam/wvdb/
Running FORCE Level-2 Processing...
10 images enqueued. Start processing with 1 CPUs
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_CTYPE = "C.UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_CTYPE = "C.UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").

Computers / CPU cores / Max jobs to run
1:local / 64 / 1

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 2524s Left: 9 AVG: 289.00s  local:1/1/100%/377.0s gsl: simplex2.c:372: ERROR: non-finite function value encountered
Default GSL error handler invoked.
/home/docker/bin/force-l2ps_: line 161: 11996 Aborted                 (core dumped) $EXE $TODO $PRM >> $LOGFILE
ETA: 3349s Left: 7 AVG: 479.33s  local:1/3/100%/508.7s gsl: simplex2.c:372: ERROR: non-finite function value encountered
Default GSL error handler invoked.
/home/docker/bin/force-l2ps_: line 161: 26080 Aborted                 (core dumped) $EXE $TODO $PRM >> $LOGFILE
ETA: 2809s Left: 5 AVG: 562.20s  local:1/5/100%/579.8s gsl: simplex2.c:372: ERROR: non-finite function value encountered
gsl: simplex2.c:372: ERROR: non-finite function value encountered
Default GSL error handler invoked.
Default GSL error handler invoked.
/home/docker/bin/force-l2ps_: line 161: 52193 Aborted                 (core dumped) $EXE $TODO $PRM >> $LOGFILE
ETA: 2062s Left: 4 AVG: 515.83s  local:1/6/100%/530.5s gsl: simplex2.c:372: ERROR: non-finite function value encountered
Default GSL error handler invoked.
/home/docker/bin/force-l2ps_: line 161: 56819 Aborted                 (core dumped) $EXE $TODO $PRM >> $LOGFILE
ETA: 0s Left: 0 AVG: 356.20s  local:0/10/100%/365.0s   
FORCE log files copied to /scratch3/delucchil/sadasadam/output
Creation of same day mosaics...
Creation of same day mosaics finished
Postprocessing to clear sky mosaics...
/usr/local/lib/python3.8/dist-packages/sadasadam/force.py:47: UserWarning: Directory /scratch3/delucchil/sadasadam/output already exists, skipping...
  warnings.warn(f"Directory {directory} already exists, skipping...")
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/usr/local/lib/python3.8/dist-packages/sadasadam/force.py", line 90, in run_subprocess
    raise Exception(f"Error running process {cmd_str}:\n {stderr_dec}")
Exception: Error running process gdal_calc.py -A /scratch3/delucchil/sadasadam/temp_force/force_dir_20231130_122402/level2/mosaic/20221202_LEVEL2_SEN2B_QAI_clipped.vrt --outfile=/scratch3/delucchil/sadasadam/temp_force/force_dir_20231130_122402/level2/mosaic/20221202_LEVEL2_SEN2B_clearsky.tif --calc="((A >> 1) & 3) == 0" --NoDataValue=0 --type=Byte --creation-option=COMPRESS=ZSTD --creation-option=BIGTIFF=YES:
 Traceback (most recent call last):
  File "/usr/local/bin/gdal_calc.py", line 483, in <module>
    main()
  File "/usr/local/bin/gdal_calc.py", line 476, in main
    doit(opts, args)
  File "/usr/local/bin/gdal_calc.py", line 352, in doit
    myResult = ((1 * (myNDVs == 0)) * myResult) + (myOutNDV * myNDVs)
numpy.core._exceptions.UFuncTypeError: ufunc 'multiply' did not contain a loop with signature matching types (dtype('<U21'), dtype('<U21')) -> dtype('<U21')

"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/sadasadam", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/sadasadam/cli.py", line 250, in main
    force_proc.postprocess(
  File "/usr/local/lib/python3.8/dist-packages/sadasadam/force.py", line 422, in postprocess
    run_subprocess_parallel(
  File "/usr/local/lib/python3.8/dist-packages/sadasadam/force.py", line 99, in run_subprocess_parallel
    pool.map(run_subprocess, cmd_list_list)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
Exception: Error running process gdal_calc.py -A /scratch3/delucchil/sadasadam/temp_force/force_dir_20231130_122402/level2/mosaic/20221202_LEVEL2_SEN2B_QAI_clipped.vrt --outfile=/scratch3/delucchil/sadasadam/temp_force/force_dir_20231130_122402/level2/mosaic/20221202_LEVEL2_SEN2B_clearsky.tif --calc="((A >> 1) & 3) == 0" --NoDataValue=0 --type=Byte --creation-option=COMPRESS=ZSTD --creation-option=BIGTIFF=YES:
 Traceback (most recent call last):
  File "/usr/local/bin/gdal_calc.py", line 483, in <module>
    main()
  File "/usr/local/bin/gdal_calc.py", line 476, in main
    doit(opts, args)
  File "/usr/local/bin/gdal_calc.py", line 352, in doit
    myResult = ((1 * (myNDVs == 0)) * myResult) + (myOutNDV * myNDVs)
numpy.core._exceptions.UFuncTypeError: ufunc 'multiply' did not contain a loop with signature matching types (dtype('<U21'), dtype('<U21')) -> dtype('<U21')
griembauer commented 9 months ago

Hi Luca, thanks for reporting!

  1. Unfortunately I have not come across this issue before. I will also try to find out more
  2. I am not sure if there is a way to get this information from FORCE automatically, however you can check in <temp_force_dir>/force_dir_<TIMESTAMP>/log while FORCE is running --> this is where FORCE will write the log files consecutively. This will only contain log files of scenes that are currently being processed or for which processing is finished/failed (in which case also a status is in the log file)
  3. I would assume that since the FORCE level-2 processing fails (Aborted (core dumped)), there may be corrupt results and gdal_calcfails consecutively. However it may also be that GDAL version 3.0.4 is too early for gdal_calc to support the bit operation that happens in the postprocessing (--calc="((A >> 1) & 3) == 0)
lucadelu commented 9 months ago

Hi Luca, thanks for reporting!

1. Unfortunately I have not come across this issue before. I will also try to find out more

ok, maybe we should find a platform to test in the same way, do you have any docker to share? otherwise I can create one

2. I am not sure if there is a way to get this information from FORCE automatically, however you can check in `<temp_force_dir>/force_dir_<TIMESTAMP>/log` while FORCE is running --> this is where FORCE will write the log files consecutively. This will only contain log files of scenes that are currently being processed or for which processing is finished/failed (in which case also a status is in the log file)

there is nothing useful inside

3. I would assume that since the FORCE level-2 processing fails (`Aborted                 (core dumped)`), there may be corrupt results and `gdal_calc`fails consecutively. However it may also be that GDAL version 3.0.4 is too early for `gdal_calc` to support the bit operation that happens in the postprocessing (`--calc="((A >> 1) & 3) == 0`)

I tried with newer GDAL version and it seems to work, so probably it is done by older version

griembauer commented 9 months ago

Hi Luca

ok, maybe we should find a platform to test in the same way, do you have any docker to share? otherwise I can create one

No, we did not prepare a Dockerfile/image as we understood the use of Docker is restricted on your environment. So if you can, please feel free to go ahead, I agree that this would facilitate testing and generalization a lot.

there is nothing useful inside

If the log files are not helping, I am not sure whether further information can be logged from FORCE. I tried to find something on logging verbosity in the FORCE, docs, issues, and discussions, but was not successful. Or do you mean that your <temp_force_dir>/force_dir_<TIMESTAMP>/log directory is empty?

lucadelu commented 6 months ago

Going to close since I'm not able to replicate this error