Closed aazuspan closed 1 year ago
Results extracting 5,000 NAIP chips:
Method | Time |
---|---|
reduceRegion[^1] + getInfo | 51.6s |
reduceRegion + toDrive | 20s |
computePixels | 331s |
computePixels
is much slower. This doesn't count the time spent parsing the 1D arrays back to 3D arrays for the first two methods, but that should be minor. It's pretty clear that using getInfo
for small samples or toDrive
for large samples will be the way to go, despite the slight increase in complexity around parsing.
[^1]: The reduceRegion
benchmarks were based on running reduceRegion
iteratively on the client-side and packing them into a FeatureCollection
to retrieve. I experimented with mapping reduceRegion
over a collection of points on the server-side and found this wasn't any faster.
For each LiDAR footprint, we need to extract intersecting NAIP imagery for training. There are a couple ways to do this.
reduceRegion
withee.Reducer.toList
on a NAIP mosaic over projected pixel footprints to create dictionaries of band-wise flat arrays that can be packed into features. These can be exported directly withgetInfo
(limited to 5k features per request) or through a table export to Drive. In either case, they need to be parsed from flat arrays back to 3D arrays before training, whether as image chips or as TFRecords.ee.data.computePixels
to directly pull chips from Earth Engine. Parallelized, this can be very fast and doesn't have any restriction on the number of chips. Chips could be saved to PNGs or presumably packed directly from Numpy arrays into TFRecords.Currently we use the first solution. Since this is something we're likely to do a few times, potentially with different chip sizes, different areas, and different images, it will be worth some testing to see which solution is the most flexible and performant.