ubarsc / kealib

KEALib provides an implementation of the GDAL data model. The format supports raster attribute tables, image pyramids, meta-data and in-built statistics while also handling very large files and compression throughout.
http://kealib.org/
MIT License
12 stars 7 forks source link

simple kea file manipulation in python? #70

Open juan-guerschman opened 14 hours ago

juan-guerschman commented 14 hours ago

hey @gillins a basic question about using kea files.

My use case needs to 1- subset a very large kea file 2- read the RAT into a pandas DF (this is under control) https://github.com/ubarsc/kealib/issues/25 3- do stuff with the data 4- write new attribute(s) to the rat 5- visualise the results, ideally in QGis

I am stuck already on 1 - trying to use a simple gdal.warp and not working, not sure why. And I think I'll need a nudge with 4. Is 5 even possible? I remember you pointing me to the tui viewer tool which I never used in the end. I have been doing all this converting all to a geodatabase, but the memory overheads are huge.

gillins commented 14 hours ago
  1. Have you tried gdal_translate with the -srcwin or -projwin options? How large is the area? If it is too much for memory you can restrict the area the RIOS function is applied to with https://www.rioshome.org/en/latest/rios_applier.html#rios.applier.ApplierControls.setFootprintType - I can give some assistance there. 2-4. I'd strongly urge you to use RIOS's ratapplier (https://www.rioshome.org/en/latest/rios_ratapplier.html) so you don't need to read all the data into memory at once. You can also do manipulation there and create new columns
  2. Indeed tuiview - this was designed mainly for doing stuff with RAT's - Qgis not so good. I know you had some trouble installing that (which turned out to be a conda problem in the end) but this is all fixed now I think. Happy to give you a hand if you need.

Anyway, above are the tools I've been involved with so a little bit biased. But - I know them very well and can support them.

juan-guerschman commented 13 hours ago
  1. Have you tried gdal_translate with the -srcwin or -projwin options? How large is the area? If it is too much for memory you can restrict the area the RIOS function is applied to with https://www.rioshome.org/en/latest/rios_applier.html#rios.applier.ApplierControls.setFootprintType - I can give some assistance there.

I actually wanted to use a geojson polygon for the subset, that's why I was trying gdal.Warp with the cutlineDSName option. What pisses me off is that I don't get an error message, but there is no output produced, don't know where the problem is.

2-4. I'd strongly urge you to use RIOS's ratapplier (https://www.rioshome.org/en/latest/rios_ratapplier.html) so you don't need to read all the data into memory at once. You can also do manipulation there and create new columns

ok, will go to rios.

  1. Indeed tuiview - this was designed mainly for doing stuff with RAT's - Qgis not so good. I know you had some trouble installing that (which turned out to be a conda problem in the end) but this is all fixed now I think. Happy to give you a hand if you need.

thanks

gillins commented 13 hours ago

If you are using vectors, I should point out that you can read the vector in RIOS (https://www.rioshome.org/en/latest/applierexamples.html#vector-inputs) and mask your data with it...

If you need to do per segment stats, then pyshepseg has got a bunch of functions for doing this in a memory efficient manner. See: https://www.pyshepseg.org/en/latest/#per-segment-statistics

Also pinging @neilflood as he has a wealth of experience with doing this stuff.

juan-guerschman commented 12 hours ago

ok, you are opening even more questions.

Can I do my kea subsettimg (using a vector file) directly in rios? I have done something similar with a "normal" tif file before, but not sure if it will work with a kea. And if it does, will the RAT be updated? I mean, deleting all the elements that were in the original kea but are not there anymore in the cropped kea file.

And in terms of using a vector to mask (and subset): will it work? I have some code that I adapted from @tonykgill that first rasterizes the vector to the same grid, and then uses the raster. He says he needed that because he used the controls.setFootprintType(applier.BOUNDS_FROM_REFERENCE) option, not available for a vector. Is that right?

gillins commented 11 hours ago

So, is your vector a square or an irregular shape?

If it is a square and you just want a subset with an updated RAT for that subset see this function: https://www.pyshepseg.org/en/latest/pyshepseg_subset.html#pyshepseg.subset.subsetImage

If it is irregular it is a bit more complex. Yes I think you are right about BOUNDS_FROM_REFERENCE not working with vector. But would you actually want to subset? Are you accumulating values (where vector == 1), or do you want to write an image? You might have to tell us a bit more about what you are trying to do....

gillins commented 11 hours ago

I've just remembers that subsetImage takes a maskImage. So if you have an irregular shape we have got you covered too. Just need to ratserize that vector to the same grid as the input image...

juan-guerschman commented 11 hours ago

it is irregular (a group of farms) I have a very large kea (all of Aus, 25 meter pixel). It's the output from pyshepseg, something that Peter produced. I want to use only the segments (pixels) within this group of farms, read some of the attributes in the RAT, run a clustering algorithm and save the output(s) as a new element(s) in the RAT.

gillins commented 11 hours ago

OK I think subsetImage is what you want then. Can you rasterize your farms to be on the same grid as your large kea? Something like this should work:

gdal_rasterize -of KEA -tr 25 25 -tap -burn 1 -init -0 farms.geojson farm_mask.kea

Then get the size of this output (pass this in as newXsize and newYsize). Unfortunately, subsetImage does not (yet) use the georeferencing information in the mask file so you'll have to work out tlx and tly yourself. Something like this should work:

large_ds = gdal.Open('largenationalraster.kea')
large_transform = large_ds.GetGeoTransform()
large_inv = gdal.InvGeoTransform(large_transform)

small_ds = gdal.Open('farm_mask.kea')
small_transform = small_ds.GetGeoTransform()
tlx, tlx = gdal.ApplyGeoTransform(large_inv, small_transform[0], small_transform[3])
newXsize = small_ds.RasterXSize
newYsize = small_ds.RasterYSize

(see https://gdal.org/en/latest/user/raster_data_model.html#affine-geotransform)

neilflood commented 11 hours ago

Hi @juan-guerschman

Firstly, most things which work with other formats also work with KEA, so no problem there.

Second - are you writing a new output KEA, for just the extent of the farms vector? Or are you trying to modify certain rows of the large KEA file, just for the pixels contained in the farm. That would be quite tricky and potentially problematic, depending on exactly what you are doing.

If you are wanting a new, small KEA, just for the farms area, with new RAT data, then do as @gillins is saying - first subset the KEA (with its RAT columns) into a small one, and then use RIOS's ratapplier to do some calculations on the RAT, writing new columns back into the small KEA.

juan-guerschman commented 11 hours ago

Second - are you writing a new output KEA, for just the extent of the farms vector? Or are you trying to modify certain rows of the large KEA file, just for the pixels contained in the farm. That would be quite tricky and potentially problematic, depending on exactly what you are doing.

If you are wanting a new, small KEA, just for the farms area, with new RAT data, then do as @gillins is saying - first subset the KEA (with its RAT columns) into a small one, and then use RIOS's ratapplier to do some calculations on the RAT, writing new columns back into the small KEA.

awesome thanks. Yes, I don't want to mess with the large KEA, that's why I wanted to subset first. I'll try what @gillins suggested. Thanks both!

neilflood commented 11 hours ago

OK, great. Yes, subsetting an image with a RAT is beyond anything which GDAL provides, because one has to also subset the rows of the RAT to match the subset of pixels. Quite tricky. @gillins wrote that particular subset tool, so we should all be grateful to him. :-)

juan-guerschman commented 10 hours ago

I've just remembers that subsetImage takes a maskImage. So if you have an irregular shape we have got you covered too. Just need to ratserize that vector to the same grid as the input image...

sorry @gillins . Where is subsetImage?

juan-guerschman commented 10 hours ago

this? https://www.pyshepseg.org/en/latest/pyshepseg_subset.html#pyshepseg.subset.subsetImage

neilflood commented 10 hours ago

this? https://www.pyshepseg.org/en/latest/pyshepseg_subset.html#pyshepseg.subset.subsetImage

That's it.

gillins commented 10 hours ago

Yes, you'll have to have pyshepseg installed to access it.

neilflood commented 10 hours ago

There is also a command line wrapper, which installs with pyshepseg, called pyshepseg_subset, which may be good enough.