thinkingmachines / ph-poverty-mapping

Mapping Philippine Poverty using Machine Learning, Satellite Imagery, and Crowd-sourced Geospatial Information
https://stories.thinkingmachin.es/philippines-most-vulnerable-communities/
MIT License
80 stars 31 forks source link

Transformation of VIIRS DNB data to .csv file ('nightlights.csv') #27

Closed johkemper closed 2 years ago

johkemper commented 4 years ago

Dear ph-pverty-mappers! How did you manage to transform the VIIRS DNB data to the .csv file 'nightlights.csv'? I would like to apply your approach to different countries. Therefore, I need to generate a 'nightlights.csv' equivalent for these other countries. I am still downloading the VIIRS data, but it seems to be image data only. How did you manage to extract longitude, altitude and nighttime light intensity information from this data? Thanks in advance fro your answer!

issa-tingzon commented 4 years ago

@johkemper We did most of our GIS pre-processing on QGIS. Looping in @ardieorden to walk you through generating nightlights.csv, which contains the center lat-lon values and corresponding nighttime light intensity values for each pixel in the VIIRS dataset.

ardieorden commented 4 years ago

Hi @johkemper , thanks for your interest in our work!

You are correct in saying that the VIIRS data is in an image format. In order to get the longitude, latitude and nighttime light intensity in a CSV format, you can follow these instructions:

  1. Install QGIS 3 (check this link for instructions and download links).
  2. Install the Point sampling tool plugin (check this link for instructions).
  3. Open the VIIRS data in QGIS. You'll notice that it covers a pretty large area. Since you'll only need the data for certain countries, I recommend that you clip the image (check this link for instructions) so that it only covers the country/countries that you're interested in. You can download a Geopackage (vector) of a country's administrative boundary from GADM.
  4. Open the Processing Toolbox and search for then double click the "Generate points (pixel centroids) inside polygons" command. This command gets the pixel centroids for an image (raster) inside of a polygon (vector). Use your image as the "raster layer" and the country/provincial/municipal administrative boundary (in my experience, provincial works best) as the "vector layer". Click the radio button for "Points inside polygons" and save the result of the command since your computer will slow down significantly if you don't save it and just create a temporary layer. Once all that is done, click "Run". You can close the window once the command is done running.
  5. The resulting output from the previous step should be a vector with a set of points. You can get the longitude and the latitude of those using the Field calculator (check this link for instructions).
  6. Go to Plugins > Analyses > Point Sampling Tool and use the vector with a set of points as "Layer containing sampling points" and then select the latitude and longitude of the vector and Band 1 of the raster as "Layers with fields/bands to get values from". Go to the "Fields" tab and rename Band 1 as ntl. Go back to the "General" tab and save the "Output point vector layer" as a CSV.

That should give something similar to nightlights.csv.

BharathRajM commented 4 years ago

How do you get the respective DHSCluster for a given latitude and longitude?

ardieorden commented 4 years ago

Hi @BharathRajM , you need to perform a buffer using the DHS Cluster points. The size of the buffer will be determined by the cluster type. Based on the methodology of the Demographic and Health Survey, clusters which have a cluster type U (for urban) have a radius of 2km and those with a cluster type R (for rural) have a radius of 5km.

After buffering, you then need to perform a spatial join using (1) the layer with the longitude and latitude and (2) the buffered layer with the DHS clusters.

sufferingindeed commented 4 years ago

How did you extract the OSM data for buildings, pois and roads? Could you please let me know? Thank you.

ardieorden commented 4 years ago

Hi @sufferingindeed , we extracted the OSM data by just downloading it from Geofabrik. For the Philippines, we downloaded it here: http://download.geofabrik.de/asia/philippines.html

sufferingindeed commented 4 years ago

yes but what file format should be downloaded and how to extract like by what procedures or software. I am new to this OSM data. Thank you again. :D

ardieorden commented 4 years ago

@sufferingindeed Sure, here are the instructions:

  1. Go to http://download.geofabrik.de/ and find the page for the country you want to download OSM data from. In this case, we want to download for the Philippines (http://download.geofabrik.de/asia/philippines.html).
  2. Download the .shp.zip file (http://download.geofabrik.de/asia/philippines-latest-free.shp.zip) and then extract all files once you're done.
  3. The files are in Shapefile format and it requires that all file extensions for a specific filename are located in the same directory. For example, gis_osm_pofw_free_1.shp, `gis_osm_pofw_free_1.shx, gis_osm_pofw_free_1.dbf, and so on all have to be in the same directory.
  4. The buildings data is named gis_osm_buildings_a_free.*. The roads data is named gis_osm_roads_free_1.*. The POIs data is named `gis_osm_pois_free_1.shp.
sufferingindeed commented 4 years ago

Yes, I see but I am sorry that I am still confusing like how to get the csv files like you did from those specific shape files (buildings,roads,pois) etc. Could you please explain me a bit more on that? Thank you very much again.

ardieorden commented 4 years ago

@sufferingindeed No worries! I'm also having trouble recalling the exact details for this research so my explanation might still be confusing or vague.

In order to get some of the columns in the osm_buildings.csv, osm_pois.csv, and osm_roads.csv files, we used the "Distance to nearest hub (points)" algorithm on QGIS in order to find the distance of the clusters to the OSM buildings, POIs, and roads.

The other columns were obtained by using the "Buffer" algorithm on QGIS with the clusters as input and then using "Join attributes by location (summary)" with the buffered clusters as one of the inputs.

GIS243 commented 4 years ago

Hi @ardieorden ardieorden

How you are getting the pop_sum in nightlights file. and while performing a spatial join using (1) the layer with the longitude and latitude and (2) the buffered layer with the DHS clusters.

are you using one to one or one to many spatial join?

Thanks

ardieorden commented 4 years ago

Hi @GIS243, I'll reply in the issue that you created (#32) so that it's easier to track.

GIS243 commented 4 years ago

Hi @GIS243, I'll reply in the issue that you created (#32) so that it's easier to track.

Thanks, I have followed the steps but I am not able to get the Pop_sum in the highlights files. can you please let me know how to get that.

ardieorden commented 4 years ago

Hi @GIS243 , apologies for the late reply! Sure, I'll reply in #32

louisaboy commented 2 years ago

Hi @BharathRajM , you need to perform a buffer using the DHS Cluster points. The size of the buffer will be determined by the cluster type. Based on the methodology of the Demographic and Health Survey, clusters which have a cluster type U (for urban) have a radius of 2km and those with a cluster type R (for rural) have a radius of 5km.

After buffering, you then need to perform a spatial join using (1) the layer with the longitude and latitude and (2) the buffered layer with the DHS clusters.

Hi @ardieorden apologies for making this thread active again but I would like to ask, did you use the python script in the document you've mentioned, the methodology of the Demographic and Health Survey, particularly the GPS Coordinate Displacement Process or did you use other methods for choosing the coordinates for the given cluster? I'm having difficulties following this step.

Thank you so much any help would be greatly appreciated

ardieorden commented 2 years ago

@louisaboy Unfortunately, the GIS processing steps (e.g. buffer, spatial join) were not done using Python. They were done using QGIS. Here's a general outline of what the steps would look like on Python.

  1. Download the "Geographic Datasets" from this website: https://dhsprogram.com/data/dataset/Philippines_Standard-DHS_2017.cfm?flag=1. Make sure to unzip.

  2. Load the shapefile using geopandas (https://geopandas.org/en/stable/index.html)

    import geopandas as gpd
    clusters = gpd.read_file('dummy_filename.shp')
  3. Perform the buffer GIS processing step (https://geopandas.org/en/stable/docs/user_guide/geometric_manipulations.html#GeoSeries.buffer)

    clusters['buffer_geometry'] = clusters['geometry'].buffer(5000)
  4. Load the nighttime lights using pandas and convert the dataframe to a geodataframe (https://geopandas.org/en/stable/gallery/create_geopandas_from_pandas.html)

    nightlights = df.read_csv('nightlights.csv')
    nightlights = gpd.GeoDataFrame(nightlights, geometry=gpd.points_from_xy(nightlights.lat, nightlights.lon))
  5. Perform the spatial join GIS processing step (https://geopandas.org/en/stable/gallery/spatial_joins.html)

    nightlights_in_clusters = clusters(nightlights, how="left")
jtmiclat commented 2 years ago

Hi! Closing this issue as ill be archiving this repository. To see our new poverty mapping project checkout https://github.com/thinkingmachines/unicef-ai4d-poverty-mapping