johntruckenbrodt / pyroSAR

framework for large-scale SAR satellite data processing
MIT License
508 stars 112 forks source link

proper handling of database duplicates #252

Open johntruckenbrodt opened 1 year ago

johntruckenbrodt commented 1 year ago

The SQLite database created via drivers.Archive maintains two tables data and duplicates. The latter contains all scenes that share a unique outname_base attribute (ID) with a scene in data. At the moment the first scene with a unique ID is put into data and no check is done to compare further scenes that share its ID.
One large deficiency of outname_base (different products with same ID, e.g. S1 SLCs and GRDs) was recently described in https://github.com/johntruckenbrodt/pyroSAR/issues/251. Furthermore, the scene in data and the scene to be inserted need to be compared to decide which of the two will be put into data. It often happens that scenes are reprocessed/republished and the scene with the latest processing time should be put into data. This could mean that the one that is currently in this table is moved to duplicates if a scene with a later processing time is being inserted into the database.

johntruckenbrodt commented 1 year ago

https://github.com/johntruckenbrodt/pyroSAR/issues/251 has been fixed in https://github.com/johntruckenbrodt/pyroSAR/pull/256