johntruckenbrodt / pyroSAR

framework for large-scale SAR satellite data processing
MIT License
494 stars 110 forks source link

[Archive] use geometry instead of bounding box #287

Closed johntruckenbrodt closed 4 months ago

johntruckenbrodt commented 8 months ago

The archive class stores the bounding box of the scene in a database for spatial querying. Some while ago the SAR driver method geometry was introduced, which returns the footprint geometry instead of the bounding box. This is more accurate and should also replace the bounding box in the database for more refined search.
Recently, two PRs were created to solve this issue:

I think a more pragmatic approach would be to completely remove the column bbox in favor of a new column geometry. The recently introduced mechanism to load an Archive in legacy mode and importing its content into a new database (https://github.com/johntruckenbrodt/pyroSAR/pull/260) could be used to migrate the database to the new layout.

With the merge of https://github.com/johntruckenbrodt/pyroSAR/pull/288, all driver classes support the method geometry (by exposing an attribute self.meta['coordinates']).

MarkusZehner commented 7 months ago

I think there is an import for older data tables in #185. If this is still of interest, I can take a look!

johntruckenbrodt commented 7 months ago

Hi @MarkusZehner. Now that #288 is merged, we can finally implement the Archive geometry column. Sorry, it took quite a bit of time. I wanted to make sure everything gets tested and works well.
It would be great if you could have a look. I would like to enable the migration from a database with bbox column to one with a geometry column by creating a new database and importing the old one. Modifying the structure of an existing database is quite dangerous I think. The method Archive.import_outdated was made for this:

from pyroSAR import Archive
db_new = 'scenes.db'
db_old = 'scenes_old.db'
with Archive(db_new) as db:
    with Archive(db_old, legacy=True) as db_old:
        db.import_outdated(db_old)
MarkusZehner commented 5 months ago

Hi @johntruckenbrodt, sorry for the delay from my side. I'm just having a look at how to address this best. If you have any hints for me, I'd be glad to talk. Otherwise, I'll look into replacing the 'bbox' as 'geometry' column in the Archive setup and adapt the import_outdated to check for this column to differentiate between Archive versions.

johntruckenbrodt commented 5 months ago

Hi @MarkusZehner, thanks for getting back to this. No need to apologize, you're not obliged to do anything here. Yes, I would do it like you say. Now that all SAR format drivers have implemented the geometry method, the call in Archive.insert can be changed from bbox to geometry. Then, the column in the database can be renamed accordingly. Upon opening an outdated database with bbox as column, an error is thrown if not opened with legacy=True. If opened in legacy mode, the method import_outdated can create a new database and re-insert all scenes found in the old database. So basically select all file paths from the old database (column scene) and pass them as list to Archive.list in the new one. If you prefer we could also have a live chat after Easter.

MarkusZehner commented 5 months ago

Hi @johntruckenbrodt, the current #296 should do most of the above-mentioned. I added some checks to the Archive.__init__ and tests to cover the new import_outdated function. I hope this helps, and yes, maybe also Pizza instead of a live chat? ;)

johntruckenbrodt commented 5 months ago

maybe also Pizza instead of a live chat? ;)

Exactly what I had in mind :grin: :pizza: