satellogic / telluric

telluric is a Python library to manage vector and raster geospatial data in an interactive and easy way
MIT License
87 stars 18 forks source link

support saving with empty crs and geo #312

Closed AmitAronovitch closed 2 years ago

AmitAronovitch commented 2 years ago

As first step towards supporting rasters that have RPC (or GCP) information instead of Affine+crs, I would like to have the following problems solved:

  1. The infinite-recursion problem that happens when trying to do certain operations on a raster is missing transform or crs should be solved (it is confusing and unintuitive anyways - if some operation is not supported, it should fail with proper error, rather than "out of memory"

  2. If we load a raster that has no transform or crs, we should be able to save it back, keeping all information that we did not touch (e.g. if it had band names, RPC tags, etc. they should be preserved). Currently it crashes due to out-of-memory as in 1

AmitAronovitch commented 2 years ago

Comments:

  1. The current branch also contains some changes done in order to support pathlib objects as argument for saving/loading instead of string. These are unrelated to the main topic, so can be removed, or moved to their own issue.

  2. A main source for the infinite recursion is this:

    • When trying to access properties that are not set (such as when saving), _populate_from_rasterio_object is called.
    • The first thing that this function does is trying to access the source_file property (wrapper around _filename), in order to read the missing properties.
    • However, if the _filename is empty (which actually happens when the raster was generated using copy_with(), or created from scratch, then source_file actually tries to save the raster into a mem-mapped-file and use that memory-url as the path.
    • But, some properties that it wants to save to memory will be missing (see first bullet - this was what started the whole thing...), so it tries to call _populate_from_rasterio_object in order to fill them up, going back to the first step.

    Following this analysis, the way I chose to stop the chain is in _populate_from_rasterio_object : just check if _filename contains a path, and return immediately if not (see second bullet). There is no point trying to fill data if there is nowhere to fill from.

  3. For the crs field, this did not suffice for what I wanted to achieve (which was to be able to save a file with no crs), and that's because _populate_from_rasterio also explicitly replaced None with CRS(). I removed that, assuming that this came from previous attempt to solve the recursion problem.

  4. Following these changes, crs can now remain None (affine can also theoretically remain None, but note that when rasterio reads a file without a transform, it automatically loads it as a unit-affine - but we can override it back if we want to save another copy)

  5. I had to modify copy_with as well. Since _filename of the copied raster is empty, we should not be able to "lazy-fill" any missing fields of the new raster (before this branch, that would cause infinite recursion. With the new changes they would just remain None - but neither case is desired). The change disabled some code that was there before, which was probably trying to cause a shallow-copy when possible, to save time. But it seems to me that this would not have worked anyways (see comment in the code).

drnextgis commented 2 years ago

Thanks @AmitAronovitch !

  1. I agree with @lilsnore that we shouldn't put shape to the init_args dictionary.
  2. Tests test_empty_crs and test_copy_raster_without_crs can be freely removed.

I do confirm that with these changes all tests are succeed.

codecov-commenter commented 2 years ago

Codecov Report

Merging #312 (ef2bcd8) into master (79c24e3) will increase coverage by 0.02%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #312      +/-   ##
==========================================
+ Coverage   90.79%   90.81%   +0.02%     
==========================================
  Files          37       38       +1     
  Lines        6038     6056      +18     
==========================================
+ Hits         5482     5500      +18     
  Misses        556      556              
Impacted Files Coverage Δ
telluric/georaster.py 93.62% <100.00%> (+0.01%) :arrow_up:
tests/test_georaster.py 100.00% <100.00%> (ø)
tests/test_nongeo.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 79c24e3...ef2bcd8. Read the comment docs.