Decide on appropriate tile size for training

alronlam commented 1 year ago

Meta RWI Study:

Paper https://arxiv.org/ftp/arxiv/papers/2104/2104.07761.pdf
Output files https://data.humdata.org/dataset/relative-wealth-index

Checked the file and saw they were using Bing tile Zoom level 14; they mentioned in the paper 2.4km tiles.

alronlam commented 1 year ago

https://www.nature.com/articles/s41467-020-16185-w#Sec10
Yeh - 6.72 km Decision was largely due to CNN design, which needed 224x224 images. 30m Landsat resolution = 6.72km tiles.

https://www.sciencedirect.com/science/article/pii/S2666389922002252 Zoom level of studies that used image data ranged from google maps api zoom level 14-17.

Original neal jean paper https://cdn.vanderbilt.edu/vu-my/wp-content/uploads/sites/2095/2019/04/14134552/ScienceMachineLearningArticle.pdf No mention of tile size

alronlam commented 1 year ago

TLDR:

Meta used 2.4km because it matched the resolution for a lot of their features. These are actually Bing tiles at zoom level 14.
They handled the centroid jitter by taking population-weighted average of 2x2 grids around the centroid for urban, then 4x4 for rural.

The prediction algorithms rely on data from several different sources (SI Appendix, Table S2). To facilitate downstream analysis, all data are converted into features that are aggregated at the level of a 2.4-km grid cell. We use 2.4-km cells because that is the highest resolution at which many of our input data are available, and it is best suited to the spatial merge with the survey data (see Supervised Machine Learning below). We were also concerned that providing estimates of wealth at even smaller grid cells might compromise the privacy of individual households. Thus, if the native resolution of a data source is higher than 2.4 km, we aggregate the smaller cells to the 2.4-km level by taking the average of the smaller cells.

Spatial Join. We match the ground-truth wealth data to the input data using spatial information present in both datasets. The 2.4-km grid cells are defined by absolute latitude and longitude coordinates specified by the Bing tile system.§ The DHS data include approximate information about the global positioning system (GPS) coordinate of the centroid of each of the 66,819 villages. However, the exact geocoordinates are masked by the DHS program with up to 2 km of jitter in urban areas and up to 5 km of jitter in rural areas. To ensure that the input data associated with each village cover the village’s true location, we include a 2 × 2 grid of 2.4-km cells around the centroid in urban areas and a 4 × 4 grid in rural areas. For each village, we then take the population-weighted average of the 112-dimensional feature vectors across a 2 × 2 or a 4 × 4 set of cells, using existing estimates of the population of 2.4-km grid cells (42). This leaves us with a training set of 66,819 villages with wealth labels (calculated from the ground-truth data) and 112-dimensional feature vectors (computed from the input data).

Meta RWI Methodology Details: https://www.pnas.org/doi/10.1073/pnas.2113658119#sec-3

tm-kah-alforja commented 1 year ago

Final recommendation is to use 2.4km tile resolution.

thinkingmachines / unicef-ai4d-poverty-mapping

Decide on appropriate tile size for training #86