ruithnadsteud / yt_georaster

A package for handling geotiff files and georeferenced datasets within yt.
1 stars 2 forks source link

Highest band resolution aliasing not working #25

Closed deastwoo closed 3 years ago

deastwoo commented 3 years ago

I’m finding that I have to play with the file order in the list to get the resolution I want for each band. When I list the ds._field_filename property I see the band alias goes to the last instance of that band in the list rather than the highest resolution.

To get the 10 m resolution I find I need to sort and reverse the list. Doing the following gives me the desired result:

path_to_sen2_jp2 = Path("path/to/sentinel_2/jp2_images")  # pathlib Path object
files = reversed(sorted(path_to_sen2_jp2.glob("*.jp2")))  # removing reversed gives 60 m priority
files = [str(f) for f in files]  # convert path objects to strings
# 10 m resolution file is the files[15] (band 8: resolution 10 m)
first_file = files.pop(15)
files.insert(0, first_file)
ds = yt.load(*files)
arevill2 commented 3 years ago

Hi @deastwoo , struggling to reproduce the bug you raised with the band aliasing. Just wondering, are you able to meet tomorrow morning to briefly demonstrate it. If not, we could leave till when we meet on Friday. Cheers.

deastwoo commented 3 years ago

@arevill2 sure thing. I am free tomorrow afternoon. I will have a go at recreating the bug with the sample data now though

deastwoo commented 3 years ago

Here is a script which recreates the issue here. Notice I am using the pathlib package (part of python 3+) to glob the files. The two functions I use to do this are get_files and get_files_reversed. When I use get_files I get a sorted list of all the available bands and their complete path. get_files_reversed just gives this list in reverse.

Note this is not related to the first band in the list being used to determine the default resolution of the dataset object. In both functions the first 10 m band listed is moved to the top of the list.

You should be able to see what I mean when you generate the plots. The two plots not only look different, you can see in the log output that when the first plot in created it resamples the 20 m red band to the 10 m, even though the 10 m red band is available in the file list.

from pathlib import Path

import yt
import yt.extensions.geotiff

def get_files(path):
    """Grab all the jp2 files and put them in a list."""
    files = sorted(path.glob("*.jp2"))
    files = [str(f) for f in files]
    # if we have a 10 m resolution jp2, use that as a primary image
    for i, f in enumerate(files):
        if f[-8:] == "_10m.jp2":
            primary_idx = i
            break
    try:
        first_file = files.pop(primary_idx)
        files.insert(0, first_file)
    except NameError:
        print("WARNING. No 10 m resolution jp2 found.")
    return files

def get_files_reversed(path):
    """Grab all the jp2 files and put them in a reverse order list."""
    files = reversed(sorted(path.glob("*.jp2")))
    files = [str(f) for f in files]
    # if we have a 10 m resolution jp2, use that as a primary image
    for i, f in enumerate(files):
        if f[-8:] == "_10m.jp2":
            primary_idx = i
            break
    try:
        first_file = files.pop(primary_idx)
        files.insert(0, first_file)
    except NameError:
        print("WARNING. No 10 m resolution jp2 found.")
    return files

# path to repo containing all of the files found at
# https://girder.hub.yt/#collection/602fc2e068085e0001d2a924/folder/60362f9468085e0001d2ae7c
resources = Path("/Users/dan.eastwood/Downloads/Resources/") 

ds = yt.load(*get_files(resources))

# which file takes priority for each band
print(ds._field_filename)

# Define NDVI function 
def _ndvi(field, data):
    return (data[('bands', 'S2_B8A')] - data[('bands', 'S2_B04')]) / \
           (data[('bands', 'S2_B8A')] + data[('bands', 'S2_B04')])

# Add NDVI to derivable fields list
ds.add_field(("band_ratios", "S2_NDVI"), function=_ndvi,
         units="", display_name='NDVI', take_log=False, sampling_type='local')

bounds = ds.arr([482650, 6148451, 484666, 6150466], 'm')
centre = ds.arr([(bounds[2] + bounds[0])/2, (bounds[1] + bounds[3])/2])
width = bounds[2] - bounds[0]
height = bounds[3] - bounds[1]

p = ds.plot(('band_ratios', 'S2_NDVI'), center=centre, width=width, height=height)
p.set_log(('band_ratios', 'S2_NDVI'), False)
p.set_cmap(('band_ratios', 'S2_NDVI'), 'RdYlGn')
p.set_zlim(('band_ratios', 'S2_NDVI'), -1, 1)
p.save("plot_1.png")

# Note above it _resamples_ the 20 m band to the 10 m, even though the 10 m band
# is available in the file list for the red band 4.

# Now lets see what happens when we reverse the list
ds2 = yt.load(*get_files_reversed(resources))

print(ds2._field_filename)

# Add NDVI to derivable fields list
ds2.add_field(("band_ratios", "S2_NDVI"), function=_ndvi,
         units="", display_name='NDVI', take_log=False, sampling_type='local')

q = ds2.plot(('band_ratios', 'S2_NDVI'), center=centre, width=width, height=height)
q.set_log(('band_ratios', 'S2_NDVI'), False)
q.set_cmap(('band_ratios', 'S2_NDVI'), 'RdYlGn')
q.set_zlim(('band_ratios', 'S2_NDVI'), -1, 1)
q.save("plot_2.png")

plot_1 plot_2

Here is the log output I get when I run this script:

yt : [INFO     ] 2021-05-20 12:14:11,471 Parameters: domain_dimensions         = [10980 10980     1]
yt : [INFO     ] 2021-05-20 12:14:11,471 Parameters: domain_left_edge          = [ 399960. 6090240.       0.] m
yt : [INFO     ] 2021-05-20 12:14:11,471 Parameters: domain_right_edge         = [5.09760e+05 6.20004e+06 1.00000e+00] m
{'S2_B02': {'filename': '/Users/dan.eastwood/Downloads/Resources/T30UVG_20200601T113331_B02_20m.jp2', 'resolution': 20.0}, 'S2_B01': {'filename': '/Users/dan.eastwood/Downloads/Resources/T30UVG_20200601T113331_B01_60m.jp2', 'resolution': 60.0}, 'S2_B03': {'filename': '/Users/dan.eastwood/Downloads/Resources/T30UVG_20200601T113331_B03_20m.jp2', 'resolution': 20.0}, 'S2_B04': {'filename': '/Users/dan.eastwood/Downloads/Resources/T30UVG_20200601T113331_B04_20m.jp2', 'resolution': 20.0}, 'S2_B05': {'filename': '/Users/dan.eastwood/Downloads/Resources/T30UVG_20200601T113331_B05_20m.jp2', 'resolution': 20.0}, 'S2_B06': {'filename': '/Users/dan.eastwood/Downloads/Resources/T30UVG_20200601T113331_B06_20m.jp2', 'resolution': 20.0}, 'S2_B07': {'filename': '/Users/dan.eastwood/Downloads/Resources/T30UVG_20200601T113331_B07_20m.jp2', 'resolution': 20.0}, 'S2_B08': {'filename': '/Users/dan.eastwood/Downloads/Resources/T30UVG_20200601T113331_B08_10m.jp2', 'resolution': 10.0}, 'S2_B11': {'filename': '/Users/dan.eastwood/Downloads/Resources/T30UVG_20200601T113331_B11_20m.jp2', 'resolution': 20.0}, 'S2_B12': {'filename': '/Users/dan.eastwood/Downloads/Resources/T30UVG_20200601T113331_B12_20m.jp2', 'resolution': 20.0}, 'S2_B8A': {'filename': '/Users/dan.eastwood/Downloads/Resources/T30UVG_20200601T113331_B8A_20m.jp2', 'resolution': 20.0}}
yt : [INFO     ] 2021-05-20 12:14:11,949 Resampling S2_B04: 20.0 to 10.0 m.
yt : [INFO     ] 2021-05-20 12:14:12,004 Resampling S2_B8A: 20.0 to 10.0 m.
yt : [INFO     ] 2021-05-20 12:14:12,007 xlim = 482650.000000 484666.000000
yt : [INFO     ] 2021-05-20 12:14:12,007 ylim = 6148450.500000 6150466.500000
yt : [INFO     ] 2021-05-20 12:14:12,007 xlim = 482650.000000 484666.000000
yt : [INFO     ] 2021-05-20 12:14:12,007 ylim = 6148450.500000 6150466.500000
yt : [INFO     ] 2021-05-20 12:14:12,008 Making a fixed resolution buffer of (('band_ratios', 'S2_NDVI')) 800 by 800
yt : [INFO     ] 2021-05-20 12:14:12,502 Saving plot plot_1.png
yt : [INFO     ] 2021-05-20 12:14:12,835 Parameters: domain_dimensions         = [10980 10980     1]
yt : [INFO     ] 2021-05-20 12:14:12,835 Parameters: domain_left_edge          = [ 399960. 6090240.       0.] m
yt : [INFO     ] 2021-05-20 12:14:12,836 Parameters: domain_right_edge         = [5.09760e+05 6.20004e+06 1.00000e+00] m
{'S2_B08': {'filename': '/Users/dan.eastwood/Downloads/Resources/T30UVG_20200601T113331_B08_10m.jp2', 'resolution': 10.0}, 'S2_B8A': {'filename': '/Users/dan.eastwood/Downloads/Resources/T30UVG_20200601T113331_B8A_20m.jp2', 'resolution': 20.0}, 'S2_B12': {'filename': '/Users/dan.eastwood/Downloads/Resources/T30UVG_20200601T113331_B12_20m.jp2', 'resolution': 20.0}, 'S2_B11': {'filename': '/Users/dan.eastwood/Downloads/Resources/T30UVG_20200601T113331_B11_20m.jp2', 'resolution': 20.0}, 'S2_B07': {'filename': '/Users/dan.eastwood/Downloads/Resources/T30UVG_20200601T113331_B07_20m.jp2', 'resolution': 20.0}, 'S2_B06': {'filename': '/Users/dan.eastwood/Downloads/Resources/T30UVG_20200601T113331_B06_20m.jp2', 'resolution': 20.0}, 'S2_B05': {'filename': '/Users/dan.eastwood/Downloads/Resources/T30UVG_20200601T113331_B05_20m.jp2', 'resolution': 20.0}, 'S2_B04': {'filename': '/Users/dan.eastwood/Downloads/Resources/T30UVG_20200601T113331_B04_10m.jp2', 'resolution': 10.0}, 'S2_B03': {'filename': '/Users/dan.eastwood/Downloads/Resources/T30UVG_20200601T113331_B03_10m.jp2', 'resolution': 10.0}, 'S2_B02': {'filename': '/Users/dan.eastwood/Downloads/Resources/T30UVG_20200601T113331_B02_10m.jp2', 'resolution': 10.0}, 'S2_B01': {'filename': '/Users/dan.eastwood/Downloads/Resources/T30UVG_20200601T113331_B01_60m.jp2', 'resolution': 60.0}}
yt : [INFO     ] 2021-05-20 12:14:13,291 Resampling S2_B8A: 20.0 to 10.0 m.
yt : [INFO     ] 2021-05-20 12:14:13,293 xlim = 482650.000000 484666.000000
yt : [INFO     ] 2021-05-20 12:14:13,293 ylim = 6148450.500000 6150466.500000
yt : [INFO     ] 2021-05-20 12:14:13,294 xlim = 482650.000000 484666.000000
yt : [INFO     ] 2021-05-20 12:14:13,294 ylim = 6148450.500000 6150466.500000
yt : [INFO     ] 2021-05-20 12:14:13,294 Making a fixed resolution buffer of (('band_ratios', 'S2_NDVI')) 800 by 800
yt : [INFO     ] 2021-05-20 12:14:13,591 Saving plot plot_2.png
arevill2 commented 3 years ago

Hi @deastwoo. Let me know what you think to the edits I have applied for dealing with the issue of different spatial resolutions of the same band. Basically, I have appended the dataset resolution (retrieved from metadata) to the field name, so the field name is unique to the band and resolution. For example, for Sentinel-2 bands existing in 10, 20 and 60 m:

ds.field_list

[('bands', 'S2_B01_60.0m'), ('bands', 'S2_B02_10.0m'), ('bands', 'S2_B02_20.0m'), ('bands', 'S2_B02_60.0m'), ('bands', 'S2_B03_10.0m'), ('bands', 'S2_B03_20.0m'), ('bands', 'S2_B03_60.0m'), ('bands', 'S2_B04_10.0m'), ('bands', 'S2_B04_20.0m'), ('bands', 'S2_B04_60.0m'), ('bands', 'S2_B05_20.0m'), ('bands', 'S2_B05_60.0m'), ('bands', 'S2_B06_20.0m'), ('bands', 'S2_B06_60.0m'), ('bands', 'S2_B07_20.0m'), ('bands', 'S2_B07_60.0m'), ('bands', 'S2_B08_10.0m'), ('bands', 'S2_B09_60.0m'), ('bands', 'S2_B11_20.0m'), ('bands', 'S2_B11_60.0m'), ('bands', 'S2_B12_20.0m'), ('bands', 'S2_B12_60.0m'), ('bands', 'S2_B8A_20.0m'), ('bands', 'S2_B8A_60.0m')]

ds.plot(('bands', 'S2_B02_20.0m')) yt : [INFO ] 2021-05-24 10:52:24,053 Resampling S2_B02_20.0m: 20.0 to 60.0 m. yt : [INFO ] 2021-05-24 10:52:24,238 xlim = 399960.000000 509760.000000 yt : [INFO ] 2021-05-24 10:52:24,239 ylim = 6090240.000000 6200040.000000 yt : [INFO ] 2021-05-24 10:52:24,240 xlim = 399960.000000 509760.000000 yt : [INFO ] 2021-05-24 10:52:24,241 ylim = 6090240.000000 6200040.000000 yt : [INFO ] 2021-05-24 10:52:24,244 Making a fixed resolution buffer of (('bands', 'S2_B02_20.0m')) 800 by 800

ds.plot(('bands', 'S2_B02_10.0m')) yt : [INFO ] 2021-05-24 10:52:52,001 Resampling S2_B02_10.0m: 10.0 to 60.0 m. yt : [INFO ] 2021-05-24 10:52:52,311 xlim = 399960.000000 509760.000000 yt : [INFO ] 2021-05-24 10:52:52,313 ylim = 6090240.000000 6200040.000000 yt : [INFO ] 2021-05-24 10:52:52,314 xlim = 399960.000000 509760.000000 yt : [INFO ] 2021-05-24 10:52:52,315 ylim = 6090240.000000 6200040.000000 yt : [INFO ] 2021-05-24 10:52:52,318 Making a fixed resolution buffer of (('bands', 'S2_B02_10.0m')) 800 by 800

deastwoo commented 3 years ago

That sounds great. I can pull the changes and have a play with it. How about if you try to query ds.plot(('bands', 'S2_B02'))?

arevill2 commented 3 years ago

That sounds great. I can pull the changes and have a play with it. How about if you try to query ds.plot(('bands', 'S2_B02'))?

At present ds.plot(('bands', 'S2_B02')) will not work - the resolution has to be specified (e.g., ds.plot(('bands', 'S2_B02_20.0m')). Is that going to be a pain? I could not remember if we were still going to use the highest resolution band when we just specific the band without (i.e., ds.plot(('bands', 'S2_B02')). Will double check with Britton. If it makes things easier for you guys we could easily add this though. cheers

deastwoo commented 3 years ago

Sorry @arevill2 I think the original plan of having an alias to the best resolution available would make the most sense. If it is a quick fix then this would be great but if it is more complex don't worry - it's not a big thing.

arevill2 commented 3 years ago

Should be resolved with the latest push.

brittonsmith commented 3 years ago

Resolved in PR #24.