wri / forma-clj

The Forest Monitoring for Action (FORMA) project provides forest clearing alerts derived from MODIS satellite imagery every 16 days beginning in December 2005. FORMA is a project of World Resources Institute and was developed by the Center for Global Development.
Eclipse Public License 1.0
1 stars 0 forks source link

should agg-chunks check number of pixels in chunk? #139

Open robinkraft opened 12 years ago

robinkraft commented 12 years ago

In static dataset preprocessing, (= ?count chunk-size) appears in the agg-chunks query, and drops any chunks that don't have the correct number of pixels (24k). Are there chunks that could have fewer than 24k pixels? If so, where would they appear? If not, we should be able to drop that line, no?

robinkraft commented 12 years ago

ping! @sritchie

eightysteele commented 12 years ago

@robinkraft how did this question come up? Are you seeing unexpected behavior someplace or just curious?

robinkraft commented 12 years ago

Curious. It seems unnecessary, or it should be. If it is necessary then something strange is happening.

On Jul 20, 2012, at 10:47 AM, Aaron Steele reply@reply.github.com wrote:

@robinkraft how did this question come up? Are you seeing unexpected behavior someplace or just curious?


Reply to this email directly or view it on GitHub: https://github.com/reddmetrics/forma-clj/issues/139#issuecomment-7136521

sritchie commented 12 years ago

Nothing strange EVER happens, just shit we don't understand yet. I think this happened when we had a static dataset that didn't exactly line up with MODIS tile boundaries.

robinkraft commented 12 years ago

Semantics aside, is this check still necessary? Do you remember what that static dataset was? Or how we were able to get it to line up? I'm trying to think through how a < 24k element chunk could be created, and whether ignoring an odd sized chunk this way would cause issues later in processing. I suppose we would get holes in the data.

Since the chunks get blossomed out into individual pixels later on, does it matter whether there are exactly 24k pixels in a chunk? There are tiles along the edges of the MODIS globe where there are nodata values. Not -3000, but just pixels that aren't considered to exist. Like tile 33 11.

(see image in next comment)

I'm not sure how GDAL handles those locations in the array it returns. (take 24000 xs) in the lower right corner of tile 33 11 could wrap around to the next line of pixels, resulting in < 24k pixels and screwing up the indexing when that chunk is blossomed out into individual pixels.

On Jul 20, 2012, at 11:12 AM, Sam Ritchie wrote:

Nothing strange EVER happens, just shit we don't understand yet. I think this happened when we had a static dataset that didn't exactly line up with MODIS tile boundaries.


Reply to this email directly or view it on GitHub: https://github.com/reddmetrics/forma-clj/issues/139#issuecomment-7137111

robinkraft commented 12 years ago

This image wasn't showing up in my previous post:

modis grid

sritchie commented 12 years ago

Your call!

On Fri, Jul 20, 2012 at 11:25 AM, Robin Kraft < reply@reply.github.com

wrote:

Semantics aside, is this check still necessary? Do you remember what that static dataset was? Or how we were able to get it to line up? I'm trying to think through how a < 24k element chunk could be created, and whether ignoring an odd sized chunk this way would cause issues later in processing. I suppose we would get holes in the data.

Since the chunks get blossomed out into individual pixels later on, does it matter whether there are exactly 24k pixels in a chunk? There are tiles along the edges of the MODIS globe where there are nodata values. Not -3000, but just pixels that aren't considered to exist. Like tile 33 11.

modis grid

I'm not sure how GDAL handles those locations in the array it returns. (take 24000 xs) in the lower right corner of tile 33 11 could wrap around to the next line of pixels, resulting in < 24k pixels and screwing up the indexing when that chunk is blossomed out into individual pixels.

On Jul 20, 2012, at 11:12 AM, Sam Ritchie wrote:

Nothing strange EVER happens, just shit we don't understand yet. I think this happened when we had a static dataset that didn't exactly line up with MODIS tile boundaries.


Reply to this email directly or view it on GitHub: https://github.com/reddmetrics/forma-clj/issues/139#issuecomment-7137111


Reply to this email directly or view it on GitHub: https://github.com/reddmetrics/forma-clj/issues/139#issuecomment-7137428

Sam Ritchie, Twitter Inc 703.662.1337 @sritchie09

(Too brief? Here's why! http://emailcharter.org)

robinkraft commented 12 years ago

Ok! But can you tell me any more about why that line might have been there?

Do you remember what that static dataset was? Or how we were able to get it to line up?

On Jul 20, 2012, at 11:31 AM, Sam Ritchie wrote:

Your call!

On Fri, Jul 20, 2012 at 11:25 AM, Robin Kraft < reply@reply.github.com

wrote:

Semantics aside, is this check still necessary? Do you remember what that static dataset was? Or how we were able to get it to line up? I'm trying to think through how a < 24k element chunk could be created, and whether ignoring an odd sized chunk this way would cause issues later in processing. I suppose we would get holes in the data.

Since the chunks get blossomed out into individual pixels later on, does it matter whether there are exactly 24k pixels in a chunk? There are tiles along the edges of the MODIS globe where there are nodata values. Not -3000, but just pixels that aren't considered to exist. Like tile 33 11.

modis grid

I'm not sure how GDAL handles those locations in the array it returns. (take 24000 xs) in the lower right corner of tile 33 11 could wrap around to the next line of pixels, resulting in < 24k pixels and screwing up the indexing when that chunk is blossomed out into individual pixels.

On Jul 20, 2012, at 11:12 AM, Sam Ritchie wrote:

Nothing strange EVER happens, just shit we don't understand yet. I think this happened when we had a static dataset that didn't exactly line up with MODIS tile boundaries.


Reply to this email directly or view it on GitHub: https://github.com/reddmetrics/forma-clj/issues/139#issuecomment-7137111


Reply to this email directly or view it on GitHub: https://github.com/reddmetrics/forma-clj/issues/139#issuecomment-7137428

Sam Ritchie, Twitter Inc 703.662.1337 @sritchie09

(Too brief? Here's why! http://emailcharter.org)


Reply to this email directly or view it on GitHub: https://github.com/reddmetrics/forma-clj/issues/139#issuecomment-7137607

sritchie commented 12 years ago

Nope, just that if a static dataset didn't totally line up with the edge of a modis pixel, you'll get chunks that don't fit in the right bounds.

eightysteele commented 12 years ago

@robinkraft you could probably whip up some super quick tests to see how GDAL handles those locations with wrap around resulting in < 24k pixels. Or you could test if <24k pixels screws up the indexing when blossoming chunks. Then we'll know if this is a non-issue or not.