kartoza / GEEST

Gender Enabling Environments Spatial Tool (GEEST)

https://worldbank.github.io/GEEST/

MIT License

1 stars 2 forks source link

Factor: Safety- Street lights/Nighttime lights #53

Open amyburness opened 5 months ago

amyburness commented 5 months ago

Mapillary

endpoint: object--street-light

This indicator should be calculated by creating 20-meter buffers around streetlights. Raster cells in which 80-100% of their area intersects with the buffers should receive a score of 5. Rasters with 60-79% intersection should be scored as 4, 40-59% as 3, 20-39% as 2, and 1-19% as 1. Areas without any overlap with streetlight buffers should be scored as 0. NOTE: Use nighttime lights only if street lights are absent.

Nighttime Lights

Nighttime lights observation data from a Low light imaging sensor part of the Visible and Infrared Imaging Suite (VIIRS) Day Night Band (DNB). Unit - (average, average-masked, max, median, median-masked, min) nW/cm2/sr, CRS - EPSG:4326 (Geographic Latitude/Longitude), Resolution - 15 arc second (~500m at the Equator)

carolinamayh commented 4 months ago

If data for streetlights or nighttime lights is unavailable, or if the user has disaggregated statistics, they should have the option to provide data on women's perceived safety at the municipal, district, state, or any other required level. The tool would then standardize these scores, percentages, or statistics on a scale from 0 to 5, where 5 indicates the lowest level of violence or the highest level of perceived safety.

javaftw commented 4 months ago

SAFStreetLight Function Overview:

Processes streetlight vector data to create a raster of safety scores based on coverage. For detailed explanation, see docs/dev_notes/README.md in the issue-53-factor-safety branch.

Constants:

Light area radius: 20 meters Score bins: [0, 1, 20, 40, 60, 80, 100] Scores: [0, 0, 1, 2, 3, 4, 5]

Function:

Setup and input processing
Buffer streetlight points (20m)
Rasterize buffered vector
Calculate coverage using convolution
Assign safety scores
Output final raster

Notes:

Uses QGIS for buffering, GDAL for rasterization, and numpy for calculations
Called by SAFnightTimeLights when input is vector data

Substitute Streetlights Data Generation

Due to the unavailability of actual streetlight location data in the sample dataset and limited time for sourcing, a method was devised to generate substitute data based on available road network information. Data Source and Preprocessing The primary data source was the roads shapefile for the area of interest. A subset of roads was extracted based on the following criteria:

'Residential' roads from the 'highways' category 'Tertiary', 'Secondary', and 'Primary' roads, which were merged into a single category of major roads

This extraction process provided a foundation for estimating streetlight distributions.

Residential Streetlight Generation For residential roads, points were scattered along the geometry at 50-meter intervals. This interval was chosen to approximate the typical spacing of streetlights in residential areas.

Major Road Streetlight Generation The process for major roads involved additional steps:

The merged major roads were filtered based on segment lengths to exclude longer cross-country roads that typically have less frequent lighting. Points were then scattered along the remaining geometry at 150-meter intervals, reflecting the generally wider spacing of streetlights on major roads compared to residential areas.

Resulting Distribution This methodology produces a calculated estimation of light source distribution for development purposes. The resulting point dataset approximates streetlight locations based on road type and typical lighting patterns.

The final distribution provides a reasonable proxy for actual streetlight locations, allowing for the development and testing of the SAFStreetLight function in the absence of real-world data.

javaftw commented 4 months ago

"Perceived Safety Data" handling is still outstanding

osundwajeff commented 4 months ago

@javaftw

javaftw commented 4 months ago

@osundwajeff

javaftw commented 3 months ago

If data for streetlights or nighttime lights is unavailable, or if the user has disaggregated statistics, they should have the option to provide data on women's perceived safety at the municipal, district, state, or any other required level. The tool would then standardize these scores, percentages, or statistics on a scale from 0 to 5, where 5 indicates the lowest level of violence or the highest level of perceived safety.

Hi @carolinamayh

Could you please elaborate on and/or provide examples of the file format of the perceived safety and disaggregated statistics data? Also, in this regard, could you please verify that the adjusted description text, as per the image, is correct?

carolinamayh commented 3 months ago

@osundwajeff @javaftw As mentioned yesterday, if streetlights data or nightlights data is not available, the user will have the option to input a safety score for the country or provide disaggregated scores by administrative units. The indicator used will be the number of homicides per 100,000 people, standardized based on the following scale:

Score 5 (Safest): 0 to 1 homicides per 100,000 people Score 4: 1.1 to 3 homicides per 100,000 people Score 3: 3.1 to 6 homicides per 100,000 people Score 2: 6.1 to 10 homicides per 100,000 people Score 1: 10.1 to 15 homicides per 100,000 people Score 0 (Least Safe): More than 15 homicides per 100,000 people

javaftw commented 3 months ago

or provide disaggregated scores by administrative units

Hi @carolinamayh (Hennie here),

While the information about "the user will have the option to input a safety score for the country" is useful (there are UI elements in the plugin which performs this action), the part about "or provide disaggregated scores by administrative units" is unclear, despite the accompanying breakdown of the scoring mechanic. Is is a shapefile? A csv? Something else? How is the user to interact with it and the fields it contains? Could you please shed some more light on this, thanks!

javaftw commented 3 months ago

@osundwajeff https://github.com/kartoza/GEEST/tree/issue-53-factor-safety

dragosgontariu commented 3 months ago

there is no option for street lights as per specifications
there is no option for disaggregated data per administrative unit as per specifications and email communications
nighttime light data throws the following error:

2024-08-07T12:32:58 INFO Results: {'OUTPUT': 'temp/countryUTMLayerBuf.shp'} 2024-08-07T12:32:58 INFO gdalwarp -overwrite -t_srs EPSG:32620 -tr 100.0 100.0 -r near -te 705589.018038702 1514124.5410898936 731809.9074413507 1562729.1977703366 -te_srs EPSG:32620 -of GTiff D:\Munca\WB\Geest\St_Lucia\SHPs\Nighttime_lights_2021.tif temp/tempResample.tif 2024-08-07T12:32:58 INFO GDAL command: 2024-08-07T12:32:58 INFO gdalwarp -overwrite -t_srs EPSG:32620 -tr 100.0 100.0 -r near -te 705589.018038702 1514124.5410898936 731809.9074413507 1562729.1977703366 -te_srs EPSG:32620 -of GTiff D:\Munca\WB\Geest\St_Lucia\SHPs\Nighttime_lights_2021.tif temp/tempResample.tif 2024-08-07T12:32:58 INFO GDAL command output: 2024-08-07T12:32:58 INFO Process completed successfully 2024-08-07T12:32:58 INFO Results: {'OUTPUT': 'temp/tempResample.tif'} 2024-08-07T12:32:58 INFO gdal_calc.bat --overwrite --calc "A1000" --format GTiff --type Int32 -A temp/tempResample.tif --A_band 1 --outfile temp/tempCalc.tif 2024-08-07T12:32:58 INFO GDAL command: 2024-08-07T12:32:58 INFO gdal_calc.bat --overwrite --calc "A1000" --format GTiff --type Int32 -A temp/tempResample.tif --A_band 1 --outfile temp/tempCalc.tif 2024-08-07T12:32:58 INFO GDAL command output: 2024-08-07T12:32:59 CRITICAL Process returned error code 1 2024-08-07T12:32:59 INFO Results: {'OUTPUT': 'temp/tempCalc.tif'} 2024-08-07T12:32:59 INFO Creating output file that is 262P x 486L. Processing D:\Munca\WB\Geest\St_Lucia\SHPs\Nighttime_lights_2021.tif [1/1] : 0Using internal nodata values (e.g. 3.4e+38) for image D:\Munca\WB\Geest\St_Lucia\SHPs\Nighttime_lights_2021.tif. Copying nodata values from source D:\Munca\WB\Geest\St_Lucia\SHPs\Nighttime_lights_2021.tif to destination temp/tempResample.tif. ...10...20...30...40...50...60...70...80...90...100 - done. 2024-08-07T12:32:59 CRITICAL RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf . Check the section C-API incompatibility at the Troubleshooting ImportError section at https://numpy.org/devdocs/user/troubleshooting-importerror.html#c-api-incompatibility for indications on how to solve this problem . Traceback (most recent call last): File "C:\PROGRA~1\QGIS 3.32.0\apps\Python39\Scripts\gdal_calc.py", line 8, in 2024-08-07T12:32:59 CRITICAL from osgeo_utils.gdal_calc import * # noqa File "C:\PROGRA~1\QGIS 3.32.0\apps\Python39\lib\site-packages\osgeo_utils\gdal_calc.py", line 46, in 2024-08-07T12:32:59 CRITICAL from osgeo import gdal, gdal_array File "C:\PROGRA~1\QGIS 3.32.0\apps\Python39\lib\site-packages\osgeo\gdal_array.py", line 13, in 2024-08-07T12:32:59 CRITICAL from . import _gdal_array ImportError: numpy.core.multiarray failed to import

Traceback (most recent call last): File "rasterio\_base.pyx", line 69, in rasterio._base.get_dataset_driver File "rasterio\_err.pyx", line 221, in rasterio._err.exc_wrap_pointer rasterio._err.CPLE_OpenFailedError: temp/tempCalc.tif: No such file or directory
```
     During handling of the above exception, another exception occurred:

     Traceback (most recent call last):
      File "C:\Users/Dragos/AppData/Roaming/QGIS/QGIS3\profiles\default/python/plugins\gender_indicator_tool\gender_indicator_tool.py", line 2952, in SAFnightTimeLights
      with rasterio.open(tempCalc, "r+") as src:
      File "C:\Users\Dragos\AppData\Roaming\Python\Python39\site-packages\rasterio\env.py", line 451, in wrapper
      return f(*args, **kwds)
      File "C:\Users\Dragos\AppData\Roaming\Python\Python39\site-packages\rasterio\__init__.py", line 334, in open
      dataset = get_writer_for_path(path, driver=driver)(
      File "C:\Users\Dragos\AppData\Roaming\Python\Python39\site-packages\rasterio\io.py", line 299, in get_writer_for_path
      driver = get_dataset_driver(path)
      File "rasterio\\_base.pyx", line 74, in rasterio._base.get_dataset_driver
     TypeError: temp/tempCalc.tif: No such file or directory
```

@osundwajeff @mvmaltitz @javaftw

javaftw commented 3 months ago

Input Type: Raster

Input Type: Shapefile (Points):

Input Type: Shapefile (Polygons)

When the input is a polygon layer, the user may choose the field of interest within that layer to process. If the field contains text (and not numeric) values, the name of the field is appended with (text)

If the user chooses a text element in the dropdown, values may be assigned to the unique text fields

The result of the above operation, using user-assigned text field values:

The results of using a numeric field:

Input Type: User Value

If no input file is specified, the user value is evaluated:

javaftw commented 3 months ago

Improvement of point rendering method

ClaraIV commented 3 months ago

@osundwajeff @mvmaltitz we will provide detailed guidance on the processing of the NTL data for this factor today;

bennyistanto commented 3 months ago

Enhanced Night Time Light (NTL) Classification for GEEST

Approach and Rationale

We propose refining the NTL classification in GEEST to better reflect local conditions, particularly for small island countries. Instead of using fixed thresholds or coverage percentages, we'll employ local statistics to create a more contextually relevant classification. This approach is supported by Elvidge et al. (2013), who emphasize the importance of considering local context in night-time light analysis.

Standard Classification Procedure:

Clip the global NTL raster to the country boundary.
Calculate statistics for the clipped raster: min, max, mean, median, and 75th percentile.
- The use of percentiles in NTL analysis is supported by Levin & Zhang (2017).
Apply the classification scheme based on local statistics.

Standard Classification Scheme:

GEEST Class	Description	NTL Value Range
0	No Access	0 - 0.05
1	Very Low	> 0.05 - 0.25 * Median
2	Low	> 0.25 Median - 0.5 Median
3	Moderate	> 0.5 * Median - Median
4	High	> Median - 75th percentile
5	Very High	> 75th percentile

This classification approach aligns with the methods used by Chen & Nordhaus (2011) in using luminosity data as a proxy for socio-economic factors.

QGIS Implementation:

For implementation in QGIS, use the Raster Calculator with the following expression:

(NTL <= 0.05) * 0 + 
(NTL > 0.05 AND NTL <= [0.25 * median]) * 1 + 
(NTL > [0.25 * median] AND NTL <= [0.5 * median]) * 2 + 
(NTL > [0.5 * median] AND NTL <= [median]) * 3 + 
(NTL > [median] AND NTL <= [75th_percentile]) * 4 + 
(NTL > [75th_percentile]) * 5

Replace [median] and [75th_percentile] with the actual calculated values.

Example Python Script for Calculating Statistics:

Here's a simple Python script to calculate the required statistics for a GeoTIFF raster:

import rasterio
import numpy as np

def calculate_raster_stats(raster_path):
    with rasterio.open(raster_path) as src:
        data = src.read(1)
        data = data[data != src.nodata]  # Remove no data values

        min_val = np.min(data)
        max_val = np.max(data)
        mean_val = np.mean(data)
        median_val = np.median(data)
        percentile_75 = np.percentile(data, 75)

        return {
            'min': min_val,
            'max': max_val,
            'mean': mean_val,
            'median': median_val,
            '75th_percentile': percentile_75
        }

# Usage
raster_path = 'path/to/your/iso3_ntl.tif'
stats = calculate_raster_stats(raster_path)
print(stats)

This script can be integrated into the GEEST QGIS plugin to automatically calculate the statistics needed for the classification.

Benefits:

Adapts to local lighting conditions
Provides meaningful classification for areas with varying NTL intensities
Maintains consistency with the original GEEST approach while improving granularity

Additional Note: Handling Areas with Low NTL Values:

If the NTL values within the Area of Interest (AOI) are predominantly or entirely below 0.05, which is our baseline for electricity access, we need to adjust our approach. Here's how to proceed:

Check the data range: After clipping the NTL raster to our AOI, examine the statistics, particularly the maximum value.
If max value < 0.05:
- This indicates that the entire area falls below our standard threshold for electricity access.
- In this case, we need to create a relative scale based on the available data. This approach is informed by Small et al. (2005), who discuss methods for analyzing areas with low light emissions.
Adjusted classification for low-value areas: Use this modified classification scheme based on the local maximum value:

Low NTL Classification Scheme:

GEEST Class	Description	NTL Value Range
0	No Light	0
1	Very Low	> 0 - 0.2 * max_value
2	Low	> 0.2 max_value - 0.4 max_value
3	Moderate	> 0.4 max_value - 0.6 max_value
4	High	> 0.6 max_value - 0.8 max_value
5	Highest	> 0.8 * max_value - max_value

This adaptive thresholding approach is conceptually similar to methods described by Otsu (1979) for image classification.

QGIS Implementation:

For implementation in QGIS, use the Raster Calculator with the following expression:

with_variable('max_val', maximum("NTL@1"),
    (NTL = 0) * 0 + 
    (NTL > 0 AND NTL <= @max_val * 0.2) * 1 + 
    (NTL > @max_val * 0.2 AND NTL <= @max_val * 0.4) * 2 + 
    (NTL > @max_val * 0.4 AND NTL <= @max_val * 0.6) * 3 + 
    (NTL > @max_val * 0.6 AND NTL <= @max_val * 0.8) * 4 + 
    (NTL > @max_val * 0.8) * 5
)

Note: Replace "NTL@1" with our actual raster layer name.

Additional Considerations:

Temporal Variability:

It's important to note that NTL data can vary seasonally and annually. As Xie et al. (2019) demonstrate, temporal variations in artificial nighttime lights can provide insights into urbanization patterns and socio-economic changes. For GEEST, consider using multi-year averaged NTL data to mitigate short-term fluctuations and capture more stable patterns of lighting infrastructure.

Limitations and Caveats:

Users of this enhanced NTL classification in GEEST should be aware of certain limitations:

Overglow Effect: NTL data can suffer from the "overglow" effect, where light appears to extend beyond its actual source (Small et al., 2005). This may lead to overestimation of lit areas in some regions.
Saturation in Urban Centers: In highly urbanized areas, NTL sensors may saturate, leading to a loss of differentiation in the brightest areas (Elvidge et al., 2013).
Rural Underrepresentation: In sparsely populated or rural areas, the resolution of NTL data may not capture small-scale lighting infrastructure adequately.

References:

Elvidge, C. D., Baugh, K. E., Zhizhin, M., & Hsu, F.-C. (2013). Why VIIRS data are superior to DMSP for mapping nighttime lights. Proceedings of the Asia-Pacific Advanced Network, 35(0), 62. https://doi.org/10.7125/APAN.35.7
Levin, N., & Zhang, Q. (2017). A global analysis of factors controlling VIIRS nighttime light levels from densely populated areas. Remote Sensing of Environment, 190, 366–382. https://doi.org/10.1016/j.rse.2017.01.006
Small, C., Pozzi, F., & Elvidge, C. D. (2005). Spatial analysis of global urban extent from DMSP-OLS night lights. Remote Sensing of Environment, 96(3-4), 277-291. https://doi.org/10.1016/j.rse.2005.02.002
Chen, X., & Nordhaus, W. D. (2011). Using luminosity data as a proxy for economic statistics. Proceedings of the National Academy of Sciences, 108(21), 8589-8594. https://doi.org/10.1073/pnas.1017031108
N. Otsu, "A Threshold Selection Method from Gray-Level Histograms," in IEEE Transactions on Systems, Man, and Cybernetics, vol. 9, no. 1, pp. 62-66, Jan. 1979, https://doi.org/10.1109/TSMC.1979.4310076
Xie, Y., Weng, Q., & Fu, P. (2019). Temporal variations of artificial nighttime lights and their implications for urbanization in the conterminous United States, 2013–2017. Remote Sensing of Environment, 225, 160-174. https://doi.org/10.1016/j.rse.2019.03.008

ClaraIV commented 3 months ago

@osundwajeff @mvmaltitz please see above, grateful if you could implement the NTL processing accordingly. I will move this to Work in Progress, thank you!

javaftw commented 3 months ago

Good morning Benny, Thank you for this information. Just a few quick questions:

Can you confirm the source and resolution of the NTL data to be used? Is it already available, or do I need to obtain it?
- Are there any specific guidelines for the statistics (esp. handling outliers, excluding certain values)?
- What should be done if the calculated median/90th percentile is very close or equal to the max value? Kind regards, Hennie Kotze

On Wed, Aug 21, 2024 at 4:46 PM Benny Istanto @.***> wrote:

Enhanced Night Time Light (NTL) Classification for GEEST Approach and Rationale

We propose refining the NTL classification in GEEST to better reflect local conditions, particularly for small island countries. Instead of using fixed thresholds or coverage percentages, we'll employ local statistics to create a more contextually relevant classification. This approach is supported by Elvidge et al. (2013), who emphasize the importance of considering local context in night-time light analysis. Standard Classification Procedure:

Clip the global NTL raster to the country boundary.

Calculate statistics for the clipped raster: min, max, mean, median, and 90th percentile.

The use of percentiles in NTL analysis is supported by Levin & Zhang (2017).

Apply the classification scheme based on local statistics.

Standard Classification Scheme: GEEST Class Description NTL Value Range 0 No Access 0 - 0.05 1 Very Low > 0.05 - 0.25 Median 2 Low > 0.25 Median - 0.5 Median 3 Moderate > 0.5 Median - Median 4 High > Median - 90th percentile 5 Very High > 90th percentile

This classification approach aligns with the methods used by Chen & Nordhaus (2011) in using luminosity data as a proxy for socio-economic factors. QGIS Implementation:

For implementation in QGIS, use the Raster Calculator with the following expression:

(NTL <= 0.05) 0 + (NTL > 0.05 AND NTL <= [0.25 median]) 1 + (NTL > [0.25 median] AND NTL <= [0.5 median]) 2 + (NTL > [0.5 median] AND NTL <= [median]) 3 + (NTL > [median] AND NTL <= [90th_percentile]) 4 + (NTL > [90th_percentile]) 5

Replace [median] and [90th_percentile] with the actual calculated values. Example Python Script for Calculating Statistics:

Here's a simple Python script to calculate the required statistics for a GeoTIFF raster:

import rasterio import numpy as np

def calculate_raster_stats(raster_path): with rasterio.open(raster_path) as src: data = src.read(1) data = data[data != src.nodata] # Remove no data values
    min_val = np.min(data)
    max_val = np.max(data)
    mean_val = np.mean(data)
    median_val = np.median(data)
    percentile_90 = np.percentile(data, 90)

    return {
        'min': min_val,
        'max': max_val,
        'mean': mean_val,
        'median': median_val,
        '90th_percentile': percentile_90
    }
Usage

raster_path = 'path/to/your/iso3_ntl.tif' stats = calculate_raster_stats(raster_path) print(stats)

This script can be integrated into the GEEST QGIS plugin to automatically calculate the statistics needed for the classification.

Benefits:

Adapts to local lighting conditions

Provides meaningful classification for areas with varying NTL intensities

Maintains consistency with the original GEEST approach while improving granularity

Additional Note: Handling Areas with Low NTL Values:

If the NTL values within the Area of Interest (AOI) are predominantly or entirely below 0.05, which is our baseline for electricity access, we need to adjust our approach. Here's how to proceed:

1.

Check the data range: After clipping the NTL raster to our AOI, examine the statistics, particularly the maximum value. 2.

If max value < 0.05:

This indicates that the entire area falls below our standard threshold for electricity access.

In this case, we need to create a relative scale based on the available data. This approach is informed by Small et al. (2005), who discuss methods for analyzing areas with low light emissions. 3.

Adjusted classification for low-value areas: Use this modified classification scheme based on the local maximum value:

Low NTL Classification Scheme: GEEST Class Description NTL Value Range 0 No Light 0 1 Very Low > 0 - 0.2 max_value 2 Low > 0.2 max_value - 0.4 max_value 3 Moderate > 0.4 max_value - 0.6 max_value 4 High > 0.6 max_value - 0.8 max_value 5 Highest > 0.8 max_value - max_value

This adaptive thresholding approach is conceptually similar to methods described by Otsu (1979) for image classification. QGIS Implementation:

For implementation in QGIS, use the Raster Calculator with the following expression:

with_variable('max_val', @.**"), (NTL = 0) 0 + (NTL > 0 AND NTL <= @max_val 0.2) 1 + (NTL > @max_val 0.2 AND NTL <= @max_val 0.4) 2 + (NTL > @max_val 0.4 AND NTL <= @max_val 0.6) 3 + (NTL > @max_val 0.6 AND NTL <= @max_val 0.8) 4 + (NTL > @max_val 0.8) * 5 )

Note: Replace @.***" with our actual raster layer name. Additional Considerations: Temporal Variability:

It's important to note that NTL data can vary seasonally and annually. As Xie et al. (2019) demonstrate, temporal variations in artificial nighttime lights can provide insights into urbanization patterns and socio-economic changes. For GEEST, consider using multi-year averaged NTL data to mitigate short-term fluctuations and capture more stable patterns of lighting infrastructure. Limitations and Caveats:

Users of this enhanced NTL classification in GEEST should be aware of certain limitations:

Overglow Effect: NTL data can suffer from the "overglow" effect, where light appears to extend beyond its actual source (Small et al., 2005). This may lead to overestimation of lit areas in some regions.

Saturation in Urban Centers: In highly urbanized areas, NTL sensors may saturate, leading to a loss of differentiation in the brightest areas (Elvidge et al., 2013).

Rural Underrepresentation: In sparsely populated or rural areas, the resolution of NTL data may not capture small-scale lighting infrastructure adequately.

References:

Elvidge, C. D., Baugh, K. E., Zhizhin, M., & Hsu, F.-C. (2013). Why VIIRS data are superior to DMSP for mapping nighttime lights. Proceedings of the Asia-Pacific Advanced Network, 35(0), 62. https://doi.org/10.7125/APAN.35.7

Levin, N., & Zhang, Q. (2017). A global analysis of factors controlling VIIRS nighttime light levels from densely populated areas. Remote Sensing of Environment, 190, 366–382. https://doi.org/10.1016/j.rse.2017.01.006

Small, C., Pozzi, F., & Elvidge, C. D. (2005). Spatial analysis of global urban extent from DMSP-OLS night lights. Remote Sensing of Environment, 96(3-4), 277-291. https://doi.org/10.1016/j.rse.2005.02.002

Chen, X., & Nordhaus, W. D. (2011). Using luminosity data as a proxy for economic statistics. Proceedings of the National Academy of Sciences, 108(21), 8589-8594. https://doi.org/10.1073/pnas.1017031108

N. Otsu, "A Threshold Selection Method from Gray-Level Histograms," in IEEE Transactions on Systems, Man, and Cybernetics, vol. 9, no. 1, pp. 62-66, Jan. 1979, https://doi.org/10.1109/TSMC.1979.4310076

Xie, Y., Weng, Q., & Fu, P. (2019). Temporal variations of artificial nighttime lights and their implications for urbanization in the conterminous United States, 2013–2017. Remote Sensing of Environment, 225, 160-174. https://doi.org/10.1016/j.rse.2019.03.008

— Reply to this email directly, view it on GitHub https://github.com/kartoza/GEEST/issues/53#issuecomment-2302230912, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANEBKFENYEJATB2UQINTIG3ZSSR23AVCNFSM6AAAAABJV65GZWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBSGIZTAOJRGI . You are receiving this because you were mentioned.Message ID: @.***>

carolinamayh commented 3 months ago

@osundwajeff, please note that the score 5 ( Very High) is > 75th percentile

dragosgontariu commented 3 months ago

@osundwajeff @mvmaltitz @javaftw

When raster data input (nighttime lights) it still gets:

Traceback (most recent call last): File "C:\Users/Dragos/AppData/Roaming/QGIS/QGIS3\profiles\default/python/plugins\gender_indicator_tool\gender_indicator_tool.py", line 3266, in SAFRasterizerDelegator self.SAFnightTimeLights() File "C:\Users/Dragos/AppData/Roaming/QGIS/QGIS3\profiles\default/python/plugins\gender_indicator_tool\gender_indicator_tool.py", line 3550, in SAFnightTimeLights with rasterio.open(tempCalc, "r+") as src: File "C:\Users\Dragos\AppData\Roaming\Python\Python39\site-packages\rasterio\env.py", line 451, in wrapper return f(*args, **kwds) File "C:\Users\Dragos\AppData\Roaming\Python\Python39\site-packages\rasterio__init__.py", line 334, in open dataset = get_writer_for_path(path, driver=driver)( File "C:\Users\Dragos\AppData\Roaming\Python\Python39\site-packages\rasterio\io.py", line 299, in get_writer_for_path driver = get_dataset_driver(path) File "rasterio\_base.pyx", line 74, in rasterio._base.get_dataset_driver TypeError: temp/tempCalc.tif: No such file or directory

Also for the points input it gets this:

where we can see that there are street lights but no data at all

bennyistanto commented 3 months ago

Good morning Benny, Thank you for this information. Just a few quick questions: - Can you confirm the source and resolution of the NTL data to be used? Is it already available, or do I need to obtain it? - Are there any specific guidelines for the statistics (esp. handling outliers, excluding certain values)? - What should be done if the calculated median/90th percentile is very close or equal to the max value? Kind regards, Hennie Kotze

Hi Hennie

Answering your question below:

After discussed with @ClaraIV, for now let's use the NTL data from this link https://eogdata.mines.edu/nighttime_light/annual/v22/2023/VNL_npp_2023_global_vcmslcfg_v2_c202402081600.average.dat.tif.gz This is the latest annual global composite for 2023, it's 9GB compressed file, and become 11GB as GeoTIFF

You can use the clipped version below for LCA testing lca_VNL_npp_2023_global_vcmslcfg_v2_c202402081600.average.dat.zip

As the data is categorized as ready-to-use, I don't think there is any additional process to use the data. From the website, handling outlier and other issue are more focused on daily data.
For the last question on median/75th percentile value are very close to the max. I create below flowchart to provides a clear visual representation of the decision-making process in the classification of NTL data, accounting for different scenarios based on the data characteristics.

graph TD
    A[Download NTL data] --> B[Gunzip data]
    B --> C[Clip using country boundary]
    C --> D[Check max_value]
    D -->|max_value > 0.05| E[Check median and 75th percentile]
    D -->|max_value <= 0.05| F[Use modified classification scheme]
    E -->|Difference > 5% of max_value| G[Use original classification scheme]
    E -->|Difference <= 5% of max_value| F
    G -->|Use median and percentile| H[Classify NTL data]
    F -->|Use max_value| H
    H --> I[Output classified NTL map]

    style A fill:#f9f,stroke:#333,stroke-width:2px
    style B fill:#bbf,stroke:#333,stroke-width:2px
    style C fill:#bbf,stroke:#333,stroke-width:2px
    style D fill:#fbb,stroke:#333,stroke-width:2px
    style E fill:#fbb,stroke:#333,stroke-width:2px
    style F fill:#bfb,stroke:#333,stroke-width:2px
    style G fill:#bfb,stroke:#333,stroke-width:2px
    style H fill:#fbf,stroke:#333,stroke-width:2px
    style I fill:#ff9,stroke:#333,stroke-width:2px

This diagram illustrates the process as follows:

Download the NTL data from the provided URL.
Gunzip the downloaded data.
Clip the data using the country boundary.
Check the max_value of the clipped NTL data.
If max_value > 0.05:
- Check if the median or 75th percentile is within 5% of the max_value.
- If the difference is > 5% of max_value, use the original classification scheme (median and percentile based).
- If the difference is <= 5% of max_value, use the modified classification scheme.
If max_value <= 0.05:
- Use the modified classification scheme directly.
Classify the NTL data using the selected scheme.
Output the classified NTL map.

Original Classification Scheme:

GEEST Class	Description	NTL Value Range
0	No Access	0 - 0.05
1	Very Low	> 0.05 - 0.25 * Median
2	Low	> 0.25 Median - 0.5 Median
3	Moderate	> 0.5 * Median - Median
4	High	> Median - 75th percentile
5	Very High	> 75th percentile

Modified Classification Scheme:

GEEST Class	Description	NTL Value Range
0	No Light	0
1	Very Low	> 0 - 0.2 * max_value
2	Low	> 0.2 max_value - 0.4 max_value
3	Moderate	> 0.4 max_value - 0.6 max_value
4	High	> 0.6 max_value - 0.8 max_value
5	Highest	> 0.8 * max_value - max_value

mvmaltitz commented 3 months ago

@dragosgontariu Test Results: Perceived Safety Value = 80

Street lights (points)

Nighttime lights (raster)

bennyistanto commented 3 months ago

@mvmaltitz I've obtained different results for the NTL classification for LCA. Here are my findings:

Max Value: 28.618135
Median: 1.060158
75th Percentile: 2.263107

The resulting classification map is shown below: geest_ntl_6class

You can review the code I used to generate this map here.

Could you please compare these results with your findings? I'm particularly interested in understanding any discrepancies in the classification thresholds or the final map output.

ClaraIV commented 3 months ago

@mvmaltitz @osundwajeff please ensure that the descriptive text on the tab is updated and explains briefly the processing of each data type. Also, please rename this factor "Safety" as per the indications in the Indicators excel --> remove "safe urban design"

javaftw commented 3 months ago

@bennyistanto Hi Benny I have modified the classification algorithm as per your instructions.

Modified algo + original supplied input data data:

Input:

Output:

Modified algo + downloaded input data data:

(Data download source ) Input:

Output:

Code snippet:

                # Get valid data (non-NaN values)
                valid_data = ntl_data[~np.isnan(ntl_data)]

                max_value = np.max(valid_data)
                median = np.median(valid_data)
                percentile_75 = np.percentile(valid_data, 75)

                print(f"Max Value: {max_value:.6f}")
                print(f"Median: {median:.6f}")
                print(f"75th Percentile: {percentile_75:.6f}")

                # Determine classification scheme
                if max_value <= 0.05 or (max_value - percentile_75) <= 0.05 * max_value:
                    print("Using max_value classification scheme")
                    classification = np.full_like(ntl_data, 0, dtype=np.uint8)
                    classification[(ntl_data > 0) & (ntl_data <= 0.2 * max_value)] = 1
                    classification[(ntl_data > 0.2 * max_value) & (ntl_data <= 0.4 * max_value)] = 2
                    classification[(ntl_data > 0.4 * max_value) & (ntl_data <= 0.6 * max_value)] = 3
                    classification[(ntl_data > 0.6 * max_value) & (ntl_data <= 0.8 * max_value)] = 4
                    classification[ntl_data > 0.8 * max_value] = 5
                else:
                    print("Using original classification scheme")
                    classification = np.full_like(ntl_data, 0, dtype=np.uint8)
                    classification[(ntl_data > 0.05) & (ntl_data <= 0.25 * median)] = 1
                    classification[(ntl_data > 0.25 * median) & (ntl_data <= 0.5 * median)] = 2
                    classification[(ntl_data > 0.5 * median) & (ntl_data <= median)] = 3
                    classification[(ntl_data > median) & (ntl_data <= percentile_75)] = 4
                    classification[ntl_data > percentile_75] = 5

                # Set NaN values in the original data to 255
                classification[np.isnan(ntl_data)] = 255

                # Update metadata for output
                out_meta.update(dtype=rasterio.uint8, nodata=255)

Using the same approach in both cases using different but very similar input data, the results appear to be significantly different.

bennyistanto commented 3 months ago

Hey Hennie @javaftw,

I've figured out why our maps look different. The classification patterns we're seeing on the map are different because we're getting different statistical values, especially for the median and 75_percentile.

Here's what I'm guessing you got:

Max Value: 28.618135
Median: 0.458617
75th Percentile: 1.280255

And here's what I'm seeing:

Max Value: 28.618135
Median: 1.060158
75th Percentile: 2.263107

The reason for this difference? It's probably how we're dealing with those pesky nodata values.

When we clip the raster with the country boundary, we might end up with areas outside the boundary that aren't properly marked as nodata. If these areas are included as zeros in our calculations, it could potentially lower our median and 75_percentile values, depending on how many of these zero values there are compared to the valid data points inside the country boundary.

I took a look at the GitHub code (https://github.com/kartoza/GEEST/blob/dev/gender_indicator_tool/gender_indicator_tool.py, lines 3640-3685), and I noticed it's following the Colab script. The thing is, Colab handles NaN values correctly during clipping, but in the gender_indicator_tool.py, especially in the gdal:cliprasterbymasklayer part, nodata is set to 0.

I tried to make the Colab script work in the PyQGIS console by tweaking how we set the NODATA value. The goal was to make sure the areas outside the boundary are properly recognized as NODATA. I've got the full code below if you want to take a look.

Screenshot 2024-08-27 194434 https://gist.github.com/bennyistanto/7ec367ba090468ca0e6e888dd13f7f3e

After testing in both Colab and PyQGIS with this updated script, I'm getting the same stats for min, max, median, and 75_percentile.

QGIS test

import numpy as np
from qgis.core import QgsRasterLayer

# File path to your raster
raster_path = r'D:\temp\geest\factor_safety\ntl\lca_VNL_npp_2023_global_vcmslcfg_v2_c202402081600.average.dat.tif'

# Load the raster layer
raster_layer = QgsRasterLayer(raster_path, "NTL")

# Check if the raster layer is valid
if not raster_layer.isValid():
    print("Error: Raster failed to load.")
else:
    provider = raster_layer.dataProvider()
    extent = raster_layer.extent()
    cols = raster_layer.width()
    rows = raster_layer.height()
    block = provider.block(1, extent, cols, rows)

    # Convert raster block to numpy array
    data = np.array([[block.value(i, j) for j in range(cols)] for i in range(rows)])

    # Handle nodata values (non-NaN)
    nodata = provider.sourceNoDataValue(1)
    valid_data = data[~np.isnan(data) & (data != nodata)]

    if valid_data.size > 0:
        # Calculate statistics
        min_value = np.min(valid_data)
        max_value = np.max(valid_data)
        median = np.median(valid_data)
        percentile_75 = np.percentile(valid_data, 75)

        print(f"Min Value: {min_value:.6f}")
        print(f"Max Value: {max_value:.6f}")
        print(f"Median: {median:.6f}")
        print(f"75th Percentile: {percentile_75:.6f}")
    else:
        print("No valid data found.")

Colab test

import numpy as np
import rasterio

def analyze_raster(raster_path):
    # Open the raster file
    with rasterio.open(raster_path) as src:
        data = src.read(1)  # Read the first band

        # Retrieve the nodata value from the raster metadata
        nodata = src.nodata

        # Get valid data (non-NaN values and not nodata)
        valid_data = data[~np.isnan(data) & (data != nodata)]

        if valid_data.size > 0:
            # Calculate statistics
            min_value = np.min(valid_data)
            max_value = np.max(valid_data)
            median = np.median(valid_data)
            percentile_75 = np.percentile(valid_data, 75)

            print(f"Min Value: {min_value:.6f}")
            print(f"Max Value: {max_value:.6f}")
            print(f"Median: {median:.6f}")
            print(f"75th Percentile: {percentile_75:.6f}")
        else:
            print("No valid data found.")

# Path to your raster file
raster_path = f'{dir}/ntl/lca_VNL_npp_2023_global_vcmslcfg_v2_c202402081600.average.dat.tif'

# Run the analysis
analyze_raster(raster_path)

Both script returned the same value:

Min Value: 0.329669
Max Value: 28.618135
Median: 1.060158
75th Percentile: 2.263107

What do you think? Does this help explain the differences we're seeing?

mvmaltitz commented 2 months ago

@timlinux

Safety trigger a warning on OGR when open the NTL data, although the data is available and loaded into map, dragged from Explorer window.

Screenshot 2024-09-12 211350

2024-09-12T21:05:41     WARNING    Cannot open D:\temp\geest\geest_qgis_templates\layers\iso3\fji_input\fji_place_saf_ntl_2023_8859_clipped.tif ().()

After Execute the button, the label next to the button appear with text:

Processing failed: Unable to execute algorithm
Could not load source layer for INPUT: temp\clippedClassified.tif not found

See below the python warning

2024-09-12T21:09:16     WARNING    warning:C:\PROGRA~1\QGIS33~1.4\apps\Python39\lib\site-packages\osgeo\ogr.py:596: FutureWarning: Neither ogr.UseExceptions() nor ogr.DontUseExceptions() has been explicitly called. In GDAL 4.0, exceptions will be enabled by default.
              warnings.warn(

             traceback: File "C:\Users/benny/AppData/Roaming/QGIS/QGIS3\profiles\default/python/plugins\gender_indicator_tool\gender_indicator_tool.py", line 3304, in SAFRasterizerDelegator
              self.SAFnightTimeLights_v2()
              File "C:\Users/benny/AppData/Roaming/QGIS/QGIS3\profiles\default/python/plugins\gender_indicator_tool\gender_indicator_tool.py", line 3740, in SAFnightTimeLights_v2
              processing.run(
              File "C:\PROGRA~1/QGIS33~1.4/apps/qgis-ltr/./python/plugins\processing\tools\general.py", line 109, in run
              return Processing.runAlgorithm(algOrName, parameters, onFinish, feedback, context)
              File "C:\PROGRA~1/QGIS33~1.4/apps/qgis-ltr/./python/plugins\processing\core\Processing.py", line 176, in runAlgorithm
              ret, results = execute(alg, parameters, context, feedback, catch_exceptions=False)
              File "C:\PROGRA~1/QGIS33~1.4/apps/qgis-ltr/./python/plugins\processing\gui\AlgorithmExecutor.py", line 70, in execute
              results, ok = alg.run(parameters, context, feedback, {}, False)
              File "C:\PROGRA~1/QGIS33~1.4/apps/qgis-ltr/./python/plugins\processing\algs\gdal\GdalAlgorithm.py", line 132, in processAlgorithm
              commands = self.getConsoleCommands(parameters, context, feedback, executing=True)
              File "C:\PROGRA~1/QGIS33~1.4/apps/qgis-ltr/./python/plugins\processing\algs\gdal\ClipRasterByMask.py", line 170, in getConsoleCommands
              maskLayer, maskLayerName = self.getOgrCompatibleSource(self.MASK, parameters, context, feedback, executing)
              File "C:\PROGRA~1/QGIS33~1.4/apps/qgis-ltr/./python/plugins\processing\algs\gdal\GdalAlgorithm.py", line 89, in getOgrCompatibleSource
              ogr_layer_name = GdalUtils.ogrLayerName(ogr_data_path)
              File "C:\PROGRA~1/QGIS33~1.4/apps/qgis-ltr/./python/plugins\processing\algs\gdal\GdalUtils.py", line 439, in ogrLayerName
              ds = ogr.Open(basePath)
              File "C:\PROGRA~1\QGIS33~1.4\apps\Python39\lib\site-packages\osgeo\ogr.py", line 7481, in Open
              _WarnIfUserHasNotSpecifiedIfUsingExceptions()
              File "C:\PROGRA~1\QGIS33~1.4\apps\Python39\lib\site-packages\osgeo\ogr.py", line 596, in _WarnIfUserHasNotSpecifiedIfUsingExceptions
              warnings.warn(

And below the Processing log

2024-09-12T21:09:16     INFO    Results: {'OUTPUT': 'output_e54fbcd2_4078_43f6_b086_52d0e9371988'}
2024-09-12T21:09:16     INFO    gdalwarp -overwrite -of GTiff -cutline C:/TEMP/processing_JvEyfI/72efe44926e247ec969f00d98a4ce638/MASK.gpkg -cl MASK -crop_to_cutline -dstnodata -3.4028234663852886e+38 temp\tempClassified.tif temp\clippedClassified.tif
2024-09-12T21:09:16     INFO    GDAL command:
2024-09-12T21:09:16     INFO    gdalwarp -overwrite -of GTiff -cutline C:/TEMP/processing_JvEyfI/72efe44926e247ec969f00d98a4ce638/MASK.gpkg -cl MASK -crop_to_cutline -dstnodata -3.4028234663852886e+38 temp\tempClassified.tif temp\clippedClassified.tif
2024-09-12T21:09:16     INFO    GDAL command output:
2024-09-12T21:09:17     CRITICAL    Process returned error code 1
2024-09-12T21:09:17     INFO    Results: {'OUTPUT': 'temp\\clippedClassified.tif'}
2024-09-12T21:09:20     INFO    Results: {'OUTPUT': 'temp/countryUTMLayerBuf.shp'}
2024-09-12T21:09:22     CRITICAL    Unable to execute algorithm
             Could not load source layer for INPUT: temp\clippedClassified.tif not found
2024-09-12T21:09:22     CRITICAL    Warning 1: Ring Self-intersection at or near point 2617160.3599800956 -2438488.8009277876
2024-09-12T21:09:22     CRITICAL    ERROR 1: Cutline polygon is invalid.

And OGR log


2024-09-12T21:05:41     WARNING    Cannot open D:\temp\geest\geest_qgis_templates\layers\iso3\fji_input\fji_place_saf_ntl_2023_8859_clipped.tif ().()
2024-09-12T21:09:16     WARNING    Cannot open EPSG:8859 ().()

See tickets #134 and #135 for full testing, results and input data