nansencenter / django-geo-spaas-sar-doppler

Django Geo-SPaaS application for SAR Doppler shift processing
GNU General Public License v3.0
3 stars 5 forks source link

File-in-the-db checking and force processing #24

Open korvinos opened 6 years ago

korvinos commented 6 years ago

Description

Ingesting and even more processing of ASAR files is a very time-consuming process. Many files go through multiple repetitions of the same procedures (as reprocessing) especially during the early steps of the application development. Thus a lot of time can be spared if we will be able to check to exist of a file in the database and go straight to next file if it does exist.

In the current version of the application, nansat_ingester handles these operations. Unfortunately, it is not an efficient way and requires a lot of processing steps be done before we really found if the file exists in the DB.

Along with that, we should be able to force processing of the file even if it already has been processed (for instance if we made some change in processing algorithm and want to renew the file without dumping the DB)

Solution

A method of checking the file in the database should be developed and initiated in the very beginning of any processing. Thus if a file has already been added to the DB it will raise a specific exception will be handled later. Also, a --force option should be added to the BaseCommand and if the exception was raised and force is True then keep processing else go to the next file

TODO

mortenwh commented 6 years ago

Reprocess option exists in the ingest or command. Perhaps it can be reused..

Den søn. 29. jul. 2018, 12:03 skrev Artem Moiseev <notifications@github.com

:

Description

Ingesting and even more processing of ASAR files is a very time-consuming process. Many files go through multiple repetitions of the same procedures (as reprocessing) especially during the early steps of the application development. Thus a lot of time can be spared if we will be able to check to exist of a file in the database and go straight to next file if it does exist.

In the current version of the application, nansat_ingester handles these operations. Unfortunately, it is not an efficient way and requires a lot of processing steps be done before we really found if the file exists in the DB.

Along with that, we should be able to force processing of the file even if it already has been processed (for instance if we made some change in processing algorithm and want to renew the file without dumping the DB) Solution

A method of checking the file in the database should be developed and initiated in the very beginning of any processing. Thus if a file has already been added to the DB it will raise a specific exception will be handled later. Also, a --force option should be added to the BaseCommand and if the exception was raised and force is True then keep processing else go to the next file TODO

  • Add force flag to the BaseCommand
  • Create an exception
  • Develop a method which will check if the file has been already added in DB and raise the exception if so

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nansencenter/django-geo-spaas-sar-doppler/issues/24, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGqBQxErnDrOTPvjc1gY7T9F7EBokiUks5uLYhqgaJpZM4VlXRm .

korvinos commented 6 years ago

It does not really matter because there is nothing to reuse

mortenwh commented 5 years ago

The options (e.g., force) are already defined in Django-Geo-SPaaS ProcessingBaseCommand. Couldn't we just inherit from that command?