Since the application runs for several hours, it could be useful to be able to restart from a checkpoint in case of an interruption.
Currently pygac-fdr-mda-collect collects all metadata in memory and at the end dumps this data into a database.
If it would write into the database every N files, and check if the file is already listed in the database before processing, it could be possible to restart the application.
Another benefit would be a reduction of memory consumption which currently grows to some GBs.
Since the application runs for several hours, it could be useful to be able to restart from a checkpoint in case of an interruption. Currently
pygac-fdr-mda-collect
collects all metadata in memory and at the end dumps this data into a database. If it would write into the database every N files, and check if the file is already listed in the database before processing, it could be possible to restart the application. Another benefit would be a reduction of memory consumption which currently grows to some GBs.