mapbox / nodata

Because the pixels you can't see are harder than the ones you can.
MIT License
1 stars 3 forks source link

tiled vs untiled differences in nodata blob #57

Closed perrygeo closed 8 years ago

perrygeo commented 8 years ago

Once again with the internal tile blocks...

Take two rasters that are pixel-identical but one is tiled and the other is not. Run them through nodata blob with tiled output and the results differ:

I think we can get around it in pxm by forcing tiling beforehand but we should address it here as well.

Here's the script I was using to recreate.

#!/bin/bash
# using the test data in test/fixtures/compositing/sortorder

rio merge-rgba -f --res 66791.69447596415 \
    --co compress=none \
    0-0-0-test-2005-c.tif \
    0-0-0-test-2004-b.tif \
    0-0-0-test-2003-a.tif \
    combined1.tif

rio merge-rgba -f --res 66791.69447596415 \
    --co compress=none --co tiled=yes \
    0-0-0-test-2005-c.tif \
    0-0-0-test-2004-b.tif \
    0-0-0-test-2003-a.tif \
    combined2.tif

# They are equal
raster-tester compare combined1.tif combined2.tif
# OK - combined1.tif is similar to within 0 pixels of combined2.tif

nodata blob -j 8 -m 4 -n --co compress=LZW --co BLOCKXSIZE=256 --co BLOCKYSIZE=256 \
    -d 0 --alphafy combined1.tif composite1.tif
nodata blob -j 8 -m 4 -n --co compress=LZW --co BLOCKXSIZE=256 --co BLOCKYSIZE=256 \
    -d 0 --alphafy combined2.tif composite2.tif

# They are not
raster-tester compare composite1.tif composite2.tif
# ValueError: Band 1 has 2831 pixels that vary by more than 16
dnomadb commented 8 years ago

I'm gonna jump on this @perrygeo. The strategy I am thinking is:

Why is this different? I'm not sure. Failing test case here: #59

perrygeo commented 8 years ago

IDK what is happening here

filliwack_untiled

perrygeo commented 8 years ago

So I made a little geojson util that might help visualize internal tiling issues: https://github.com/mapbox/rio-block-features.

Noticed two things:

blocks16

Still not sure what the solution is but iterating processing these 16px tall blocks appears to be related.

dnomadb commented 8 years ago

@perrygeo ya, the 16px height blocksize comes from it being jpeg compressed, I think.

I manually stopped any filling from occurring and just returned the window, and am still seeing the shift: shift

This behavior probably has something to do with either (a) boundless reading of full-width windows in rasterio, (b) or my padding + unpadding logic (but why fail only on these)?

perrygeo commented 8 years ago

Grasping at straws here, wonder if it has anything to do with this: https://github.com/mapbox/rasterio/issues/520#issue-115078838

tldr: writing to raster where blocksize is > image size causes shifting.

Maybe blockxsize == width causes the same issue?

perrygeo commented 8 years ago

@dnomadb question about the test fixtures, How did you create the untiled fixture? The tiled and untiled are different right off the bat

$ raster-tester compare --no-error seams_4band.tif seams_4band_untiled.tif
NOT OK - Band 1 has 1235 pixels that vary by more than 16
$ rio info seams_4band_untiled.tif
{"count": 4, "crs": "EPSG:3857", "interleave": "pixel", "dtype": "uint8", "driver": "GTiff", "transform": [8.0, 0.0, -13550756.3744, 0.0, -8.0, 6315533.02503], "lnglat": [-121.70652486688023, 49.22476090053553], "height": 612, "width": 612, "shape": [612, 612], "tiled": false, "res": [8.0, 8.0], "nodata": null, "bounds": [-13550756.3744, 6310637.02503, -13545860.3744, 6315533.02503], "compress": "jpeg"}

If I use gdal_translate, I can get a pixel-perfect copy

$ gdal_translate -co TILED=FALSE seams_4band.tif seams_4band_untiled2.tif
$ raster-tester compare --no-error seams_4band.tif seams_4band_untiled2.tif
OK - seams_4band.tif is similar to within 0 pixels of seams_4band_untiled2.tif
$ rio info seams_4band_untiled2.tif
{"count": 4, "crs": "EPSG:3857", "interleave": "pixel", "dtype": "uint8", "driver": "GTiff", "transform": [8.0, 0.0, -13550756.3744, 0.0, -8.0, 6315533.02503], "lnglat": [-121.70652486688023, 49.22476090053553], "height": 612, "width": 612, "shape": [612, 612], "tiled": false, "res": [8.0, 8.0], "nodata": null, "bounds": [-13550756.3744, 6310637.02503, -13545860.3744, 6315533.02503]}

Still, even this one fails with the weird shifting behavior. But we should update the test fixture to reduce the number of extraneous variables here.

perrygeo commented 8 years ago

not sure if this is relevant

The strip height of the untiled raster created by gdal_translate is 3px height. Found this in the GDAL docs

By default stripped TIFF files are created... strip height defaults to a value such that one strip is 8K or less

perrygeo commented 8 years ago

Here's a summary of my observations so far. This is weird. I'm perplexed, flummoxed and bewildered.

The untiled x-shift

Observations

Conclusions

:confused: There's a blocksize > imagesize issue in rasterio, so maybe they share an underlying bug?

perrygeo commented 8 years ago

I've isolated the problem, no fix yet but at least we have a specific failing test case... There appears to be a bug in rasterio boundless reads.

Check out the test case. In a nutshell, reading an unpadded window should be the same as reading a padded window then subtracting the padding with an index. But it's not in the case of untiled block windows.

@dnomadb Two things before I post it to the rasterio issue tracker

/cc @sgillies

perrygeo commented 8 years ago

A slightly more tidy test case.

>>> src.width
612
>>> src.read(boundless=True, window=((100, 101), (0, 612)))[0, 0, 0:9]
array([68, 45, 63, 67, 73, 78, 87, 79, 81], dtype=uint8)
>>> src.read(boundless=True, window=((100, 101), (-1, 612)))[0, 0, 0:9]
array([ 0, 68, 45, 63, 67, 73, 78, 87, 79], dtype=uint8)
>>> src.read(boundless=True, window=((100, 101), (-1, 613)))[0, 0, 0:9]
array([ 0,  0, 68, 45, 63, 67, 73, 78, 87], dtype=uint8)  # WTF

Notice the last one, we shift the row stop (window[1][1]) to > image width, and it shifts the start row.

Also important to note: we see this behavior even on tiled tiffs. So it's a math error related to windows and image size, not internal tiling!

perrygeo commented 8 years ago

Moving bug fix discussion to rasterio https://github.com/mapbox/rasterio/issues/532

perrygeo commented 8 years ago

closing this since it's not a nodata bug