Open amyburness opened 5 months ago
If data for streetlights or nighttime lights is unavailable, or if the user has disaggregated statistics, they should have the option to provide data on women's perceived safety at the municipal, district, state, or any other required level. The tool would then standardize these scores, percentages, or statistics on a scale from 0 to 5, where 5 indicates the lowest level of violence or the highest level of perceived safety.
Processes streetlight vector data to create a raster of safety scores based on coverage. For detailed explanation, see docs/dev_notes/README.md in the issue-53-factor-safety branch.
Constants:
Light area radius: 20 meters Score bins: [0, 1, 20, 40, 60, 80, 100] Scores: [0, 0, 1, 2, 3, 4, 5]
Function:
Notes:
Due to the unavailability of actual streetlight location data in the sample dataset and limited time for sourcing, a method was devised to generate substitute data based on available road network information. Data Source and Preprocessing The primary data source was the roads shapefile for the area of interest. A subset of roads was extracted based on the following criteria:
'Residential' roads from the 'highways' category 'Tertiary', 'Secondary', and 'Primary' roads, which were merged into a single category of major roads
This extraction process provided a foundation for estimating streetlight distributions.
Residential Streetlight Generation For residential roads, points were scattered along the geometry at 50-meter intervals. This interval was chosen to approximate the typical spacing of streetlights in residential areas.
Major Road Streetlight Generation The process for major roads involved additional steps:
The merged major roads were filtered based on segment lengths to exclude longer cross-country roads that typically have less frequent lighting. Points were then scattered along the remaining geometry at 150-meter intervals, reflecting the generally wider spacing of streetlights on major roads compared to residential areas.
Resulting Distribution This methodology produces a calculated estimation of light source distribution for development purposes. The resulting point dataset approximates streetlight locations based on road type and typical lighting patterns.
The final distribution provides a reasonable proxy for actual streetlight locations, allowing for the development and testing of the SAFStreetLight function in the absence of real-world data.
"Perceived Safety Data" handling is still outstanding
@javaftw
@osundwajeff
If data for streetlights or nighttime lights is unavailable, or if the user has disaggregated statistics, they should have the option to provide data on women's perceived safety at the municipal, district, state, or any other required level. The tool would then standardize these scores, percentages, or statistics on a scale from 0 to 5, where 5 indicates the lowest level of violence or the highest level of perceived safety.
Hi @carolinamayh
Could you please elaborate on and/or provide examples of the file format of the perceived safety
and disaggregated statistics
data? Also, in this regard, could you please verify that the adjusted description text, as per the image, is correct?
@osundwajeff @javaftw As mentioned yesterday, if streetlights data or nightlights data is not available, the user will have the option to input a safety score for the country or provide disaggregated scores by administrative units. The indicator used will be the number of homicides per 100,000 people, standardized based on the following scale:
Score 5 (Safest): 0 to 1 homicides per 100,000 people Score 4: 1.1 to 3 homicides per 100,000 people Score 3: 3.1 to 6 homicides per 100,000 people Score 2: 6.1 to 10 homicides per 100,000 people Score 1: 10.1 to 15 homicides per 100,000 people Score 0 (Least Safe): More than 15 homicides per 100,000 people
or provide disaggregated scores by administrative units
Hi @carolinamayh (Hennie here),
While the information about "the user will have the option to input a safety score for the country" is useful (there are UI elements in the plugin which performs this action), the part about "or provide disaggregated scores by administrative units" is unclear, despite the accompanying breakdown of the scoring mechanic. Is is a shapefile? A csv? Something else? How is the user to interact with it and the fields it contains? Could you please shed some more light on this, thanks!
nighttime light data throws the following error:
2024-08-07T12:32:58 INFO Results: {'OUTPUT': 'temp/countryUTMLayerBuf.shp'} 2024-08-07T12:32:58 INFO gdalwarp -overwrite -t_srs EPSG:32620 -tr 100.0 100.0 -r near -te 705589.018038702 1514124.5410898936 731809.9074413507 1562729.1977703366 -te_srs EPSG:32620 -of GTiff D:\Munca\WB\Geest\St_Lucia\SHPs\Nighttime_lights_2021.tif temp/tempResample.tif 2024-08-07T12:32:58 INFO GDAL command: 2024-08-07T12:32:58 INFO gdalwarp -overwrite -t_srs EPSG:32620 -tr 100.0 100.0 -r near -te 705589.018038702 1514124.5410898936 731809.9074413507 1562729.1977703366 -te_srs EPSG:32620 -of GTiff D:\Munca\WB\Geest\St_Lucia\SHPs\Nighttime_lights_2021.tif temp/tempResample.tif 2024-08-07T12:32:58 INFO GDAL command output: 2024-08-07T12:32:58 INFO Process completed successfully 2024-08-07T12:32:58 INFO Results: {'OUTPUT': 'temp/tempResample.tif'} 2024-08-07T12:32:58 INFO gdal_calc.bat --overwrite --calc "A1000" --format GTiff --type Int32 -A temp/tempResample.tif --A_band 1 --outfile temp/tempCalc.tif 2024-08-07T12:32:58 INFO GDAL command: 2024-08-07T12:32:58 INFO gdal_calc.bat --overwrite --calc "A1000" --format GTiff --type Int32 -A temp/tempResample.tif --A_band 1 --outfile temp/tempCalc.tif 2024-08-07T12:32:58 INFO GDAL command output: 2024-08-07T12:32:59 CRITICAL Process returned error code 1 2024-08-07T12:32:59 INFO Results: {'OUTPUT': 'temp/tempCalc.tif'} 2024-08-07T12:32:59 INFO Creating output file that is 262P x 486L. Processing D:\Munca\WB\Geest\St_Lucia\SHPs\Nighttime_lights_2021.tif [1/1] : 0Using internal nodata values (e.g. 3.4e+38) for image D:\Munca\WB\Geest\St_Lucia\SHPs\Nighttime_lights_2021.tif. Copying nodata values from source D:\Munca\WB\Geest\St_Lucia\SHPs\Nighttime_lights_2021.tif to destination temp/tempResample.tif. ...10...20...30...40...50...60...70...80...90...100 - done. 2024-08-07T12:32:59 CRITICAL RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf . Check the section C-API incompatibility at the Troubleshooting ImportError section at https://numpy.org/devdocs/user/troubleshooting-importerror.html#c-api-incompatibility for indications on how to solve this problem . Traceback (most recent call last): File "C:\PROGRA~1\QGIS 3.32.0\apps\Python39\Scripts\gdal_calc.py", line 8, in 2024-08-07T12:32:59 CRITICAL from osgeo_utils.gdal_calc import * # noqa File "C:\PROGRA~1\QGIS 3.32.0\apps\Python39\lib\site-packages\osgeo_utils\gdal_calc.py", line 46, in 2024-08-07T12:32:59 CRITICAL from osgeo import gdal, gdal_array File "C:\PROGRA~1\QGIS 3.32.0\apps\Python39\lib\site-packages\osgeo\gdal_array.py", line 13, in 2024-08-07T12:32:59 CRITICAL from . import _gdal_array ImportError: numpy.core.multiarray failed to import
Traceback (most recent call last): File "rasterio\_base.pyx", line 69, in rasterio._base.get_dataset_driver File "rasterio\_err.pyx", line 221, in rasterio._err.exc_wrap_pointer rasterio._err.CPLE_OpenFailedError: temp/tempCalc.tif: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users/Dragos/AppData/Roaming/QGIS/QGIS3\profiles\default/python/plugins\gender_indicator_tool\gender_indicator_tool.py", line 2952, in SAFnightTimeLights
with rasterio.open(tempCalc, "r+") as src:
File "C:\Users\Dragos\AppData\Roaming\Python\Python39\site-packages\rasterio\env.py", line 451, in wrapper
return f(*args, **kwds)
File "C:\Users\Dragos\AppData\Roaming\Python\Python39\site-packages\rasterio\__init__.py", line 334, in open
dataset = get_writer_for_path(path, driver=driver)(
File "C:\Users\Dragos\AppData\Roaming\Python\Python39\site-packages\rasterio\io.py", line 299, in get_writer_for_path
driver = get_dataset_driver(path)
File "rasterio\\_base.pyx", line 74, in rasterio._base.get_dataset_driver
TypeError: temp/tempCalc.tif: No such file or directory
@osundwajeff @mvmaltitz @javaftw
Input Type: Raster
Input Type: Shapefile (Points):
Input Type: Shapefile (Polygons)
When the input is a polygon layer, the user may choose the field of interest within that layer to process. If the field contains text (and not numeric) values, the name of the field is appended with (text)
If the user chooses a text element in the dropdown, values may be assigned to the unique text fields
The result of the above operation, using user-assigned text field values:
The results of using a numeric field:
Input Type: User Value
If no input file is specified, the user value is evaluated:
Improvement of point rendering method
@osundwajeff @mvmaltitz we will provide detailed guidance on the processing of the NTL data for this factor today;
We propose refining the NTL classification in GEEST to better reflect local conditions, particularly for small island countries. Instead of using fixed thresholds or coverage percentages, we'll employ local statistics to create a more contextually relevant classification. This approach is supported by Elvidge et al. (2013), who emphasize the importance of considering local context in night-time light analysis.
GEEST Class | Description | NTL Value Range |
---|---|---|
0 | No Access | 0 - 0.05 |
1 | Very Low | > 0.05 - 0.25 * Median |
2 | Low | > 0.25 Median - 0.5 Median |
3 | Moderate | > 0.5 * Median - Median |
4 | High | > Median - 75th percentile |
5 | Very High | > 75th percentile |
This classification approach aligns with the methods used by Chen & Nordhaus (2011) in using luminosity data as a proxy for socio-economic factors.
For implementation in QGIS, use the Raster Calculator with the following expression:
(NTL <= 0.05) * 0 +
(NTL > 0.05 AND NTL <= [0.25 * median]) * 1 +
(NTL > [0.25 * median] AND NTL <= [0.5 * median]) * 2 +
(NTL > [0.5 * median] AND NTL <= [median]) * 3 +
(NTL > [median] AND NTL <= [75th_percentile]) * 4 +
(NTL > [75th_percentile]) * 5
Replace [median]
and [75th_percentile]
with the actual calculated values.
Here's a simple Python script to calculate the required statistics for a GeoTIFF raster:
import rasterio
import numpy as np
def calculate_raster_stats(raster_path):
with rasterio.open(raster_path) as src:
data = src.read(1)
data = data[data != src.nodata] # Remove no data values
min_val = np.min(data)
max_val = np.max(data)
mean_val = np.mean(data)
median_val = np.median(data)
percentile_75 = np.percentile(data, 75)
return {
'min': min_val,
'max': max_val,
'mean': mean_val,
'median': median_val,
'75th_percentile': percentile_75
}
# Usage
raster_path = 'path/to/your/iso3_ntl.tif'
stats = calculate_raster_stats(raster_path)
print(stats)
This script can be integrated into the GEEST QGIS plugin to automatically calculate the statistics needed for the classification.
Benefits:
If the NTL values within the Area of Interest (AOI) are predominantly or entirely below 0.05, which is our baseline for electricity access, we need to adjust our approach. Here's how to proceed:
Check the data range: After clipping the NTL raster to our AOI, examine the statistics, particularly the maximum value.
If max value < 0.05:
Adjusted classification for low-value areas: Use this modified classification scheme based on the local maximum value:
GEEST Class | Description | NTL Value Range |
---|---|---|
0 | No Light | 0 |
1 | Very Low | > 0 - 0.2 * max_value |
2 | Low | > 0.2 max_value - 0.4 max_value |
3 | Moderate | > 0.4 max_value - 0.6 max_value |
4 | High | > 0.6 max_value - 0.8 max_value |
5 | Highest | > 0.8 * max_value - max_value |
This adaptive thresholding approach is conceptually similar to methods described by Otsu (1979) for image classification.
For implementation in QGIS, use the Raster Calculator with the following expression:
with_variable('max_val', maximum("NTL@1"),
(NTL = 0) * 0 +
(NTL > 0 AND NTL <= @max_val * 0.2) * 1 +
(NTL > @max_val * 0.2 AND NTL <= @max_val * 0.4) * 2 +
(NTL > @max_val * 0.4 AND NTL <= @max_val * 0.6) * 3 +
(NTL > @max_val * 0.6 AND NTL <= @max_val * 0.8) * 4 +
(NTL > @max_val * 0.8) * 5
)
Note: Replace "NTL@1" with our actual raster layer name.
It's important to note that NTL data can vary seasonally and annually. As Xie et al. (2019) demonstrate, temporal variations in artificial nighttime lights can provide insights into urbanization patterns and socio-economic changes. For GEEST, consider using multi-year averaged NTL data to mitigate short-term fluctuations and capture more stable patterns of lighting infrastructure.
Users of this enhanced NTL classification in GEEST should be aware of certain limitations:
@osundwajeff @mvmaltitz please see above, grateful if you could implement the NTL processing accordingly. I will move this to Work in Progress, thank you!
Good morning Benny, Thank you for this information. Just a few quick questions:
On Wed, Aug 21, 2024 at 4:46 PM Benny Istanto @.***> wrote:
Enhanced Night Time Light (NTL) Classification for GEEST Approach and Rationale
We propose refining the NTL classification in GEEST to better reflect local conditions, particularly for small island countries. Instead of using fixed thresholds or coverage percentages, we'll employ local statistics to create a more contextually relevant classification. This approach is supported by Elvidge et al. (2013), who emphasize the importance of considering local context in night-time light analysis. Standard Classification Procedure:
- Clip the global NTL raster to the country boundary.
- Calculate statistics for the clipped raster: min, max, mean, median, and 90th percentile.
- The use of percentiles in NTL analysis is supported by Levin & Zhang (2017).
- Apply the classification scheme based on local statistics.
Standard Classification Scheme: GEEST Class Description NTL Value Range 0 No Access 0 - 0.05 1 Very Low > 0.05 - 0.25 Median 2 Low > 0.25 Median - 0.5 Median 3 Moderate > 0.5 Median - Median 4 High > Median - 90th percentile 5 Very High > 90th percentile
This classification approach aligns with the methods used by Chen & Nordhaus (2011) in using luminosity data as a proxy for socio-economic factors. QGIS Implementation:
For implementation in QGIS, use the Raster Calculator with the following expression:
(NTL <= 0.05) 0 + (NTL > 0.05 AND NTL <= [0.25 median]) 1 + (NTL > [0.25 median] AND NTL <= [0.5 median]) 2 + (NTL > [0.5 median] AND NTL <= [median]) 3 + (NTL > [median] AND NTL <= [90th_percentile]) 4 + (NTL > [90th_percentile]) 5
Replace [median] and [90th_percentile] with the actual calculated values. Example Python Script for Calculating Statistics:
Here's a simple Python script to calculate the required statistics for a GeoTIFF raster:
import rasterio import numpy as np
def calculate_raster_stats(raster_path): with rasterio.open(raster_path) as src: data = src.read(1) data = data[data != src.nodata] # Remove no data values
min_val = np.min(data) max_val = np.max(data) mean_val = np.mean(data) median_val = np.median(data) percentile_90 = np.percentile(data, 90) return { 'min': min_val, 'max': max_val, 'mean': mean_val, 'median': median_val, '90th_percentile': percentile_90 }
Usage
raster_path = 'path/to/your/iso3_ntl.tif' stats = calculate_raster_stats(raster_path) print(stats)
This script can be integrated into the GEEST QGIS plugin to automatically calculate the statistics needed for the classification.
Benefits:
- Adapts to local lighting conditions
- Provides meaningful classification for areas with varying NTL intensities
- Maintains consistency with the original GEEST approach while improving granularity
Additional Note: Handling Areas with Low NTL Values:
If the NTL values within the Area of Interest (AOI) are predominantly or entirely below 0.05, which is our baseline for electricity access, we need to adjust our approach. Here's how to proceed:
1.
Check the data range: After clipping the NTL raster to our AOI, examine the statistics, particularly the maximum value. 2.
If max value < 0.05:
This indicates that the entire area falls below our standard threshold for electricity access.
- In this case, we need to create a relative scale based on the available data. This approach is informed by Small et al. (2005), who discuss methods for analyzing areas with low light emissions. 3.
Adjusted classification for low-value areas: Use this modified classification scheme based on the local maximum value:
Low NTL Classification Scheme: GEEST Class Description NTL Value Range 0 No Light 0 1 Very Low > 0 - 0.2 max_value 2 Low > 0.2 max_value - 0.4 max_value 3 Moderate > 0.4 max_value - 0.6 max_value 4 High > 0.6 max_value - 0.8 max_value 5 Highest > 0.8 max_value - max_value
This adaptive thresholding approach is conceptually similar to methods described by Otsu (1979) for image classification. QGIS Implementation:
For implementation in QGIS, use the Raster Calculator with the following expression:
with_variable('max_val', @.**"), (NTL = 0) 0 + (NTL > 0 AND NTL <= @max_val 0.2) 1 + (NTL > @max_val 0.2 AND NTL <= @max_val 0.4) 2 + (NTL > @max_val 0.4 AND NTL <= @max_val 0.6) 3 + (NTL > @max_val 0.6 AND NTL <= @max_val 0.8) 4 + (NTL > @max_val 0.8) * 5 )
Note: Replace @.***" with our actual raster layer name. Additional Considerations: Temporal Variability:
It's important to note that NTL data can vary seasonally and annually. As Xie et al. (2019) demonstrate, temporal variations in artificial nighttime lights can provide insights into urbanization patterns and socio-economic changes. For GEEST, consider using multi-year averaged NTL data to mitigate short-term fluctuations and capture more stable patterns of lighting infrastructure. Limitations and Caveats:
Users of this enhanced NTL classification in GEEST should be aware of certain limitations:
- Overglow Effect: NTL data can suffer from the "overglow" effect, where light appears to extend beyond its actual source (Small et al., 2005). This may lead to overestimation of lit areas in some regions.
- Saturation in Urban Centers: In highly urbanized areas, NTL sensors may saturate, leading to a loss of differentiation in the brightest areas (Elvidge et al., 2013).
- Rural Underrepresentation: In sparsely populated or rural areas, the resolution of NTL data may not capture small-scale lighting infrastructure adequately.
References:
- Elvidge, C. D., Baugh, K. E., Zhizhin, M., & Hsu, F.-C. (2013). Why VIIRS data are superior to DMSP for mapping nighttime lights. Proceedings of the Asia-Pacific Advanced Network, 35(0), 62. https://doi.org/10.7125/APAN.35.7
- Levin, N., & Zhang, Q. (2017). A global analysis of factors controlling VIIRS nighttime light levels from densely populated areas. Remote Sensing of Environment, 190, 366–382. https://doi.org/10.1016/j.rse.2017.01.006
- Small, C., Pozzi, F., & Elvidge, C. D. (2005). Spatial analysis of global urban extent from DMSP-OLS night lights. Remote Sensing of Environment, 96(3-4), 277-291. https://doi.org/10.1016/j.rse.2005.02.002
- Chen, X., & Nordhaus, W. D. (2011). Using luminosity data as a proxy for economic statistics. Proceedings of the National Academy of Sciences, 108(21), 8589-8594. https://doi.org/10.1073/pnas.1017031108
- N. Otsu, "A Threshold Selection Method from Gray-Level Histograms," in IEEE Transactions on Systems, Man, and Cybernetics, vol. 9, no. 1, pp. 62-66, Jan. 1979, https://doi.org/10.1109/TSMC.1979.4310076
- Xie, Y., Weng, Q., & Fu, P. (2019). Temporal variations of artificial nighttime lights and their implications for urbanization in the conterminous United States, 2013–2017. Remote Sensing of Environment, 225, 160-174. https://doi.org/10.1016/j.rse.2019.03.008
— Reply to this email directly, view it on GitHub https://github.com/kartoza/GEEST/issues/53#issuecomment-2302230912, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANEBKFENYEJATB2UQINTIG3ZSSR23AVCNFSM6AAAAABJV65GZWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBSGIZTAOJRGI . You are receiving this because you were mentioned.Message ID: @.***>
@osundwajeff, please note that the score 5 ( Very High) is > 75th percentile
@osundwajeff @mvmaltitz @javaftw
When raster data input (nighttime lights) it still gets:
Traceback (most recent call last): File "C:\Users/Dragos/AppData/Roaming/QGIS/QGIS3\profiles\default/python/plugins\gender_indicator_tool\gender_indicator_tool.py", line 3266, in SAFRasterizerDelegator self.SAFnightTimeLights() File "C:\Users/Dragos/AppData/Roaming/QGIS/QGIS3\profiles\default/python/plugins\gender_indicator_tool\gender_indicator_tool.py", line 3550, in SAFnightTimeLights with rasterio.open(tempCalc, "r+") as src: File "C:\Users\Dragos\AppData\Roaming\Python\Python39\site-packages\rasterio\env.py", line 451, in wrapper return f(*args, **kwds) File "C:\Users\Dragos\AppData\Roaming\Python\Python39\site-packages\rasterio__init__.py", line 334, in open dataset = get_writer_for_path(path, driver=driver)( File "C:\Users\Dragos\AppData\Roaming\Python\Python39\site-packages\rasterio\io.py", line 299, in get_writer_for_path driver = get_dataset_driver(path) File "rasterio\_base.pyx", line 74, in rasterio._base.get_dataset_driver TypeError: temp/tempCalc.tif: No such file or directory
Also for the points input it gets this:
where we can see that there are street lights but no data at all
Good morning Benny, Thank you for this information. Just a few quick questions: - Can you confirm the source and resolution of the NTL data to be used? Is it already available, or do I need to obtain it? - Are there any specific guidelines for the statistics (esp. handling outliers, excluding certain values)? - What should be done if the calculated median/90th percentile is very close or equal to the max value? Kind regards, Hennie Kotze
Hi Hennie
Answering your question below:
You can use the clipped version below for LCA testing lca_VNL_npp_2023_global_vcmslcfg_v2_c202402081600.average.dat.zip
As the data is categorized as ready-to-use, I don't think there is any additional process to use the data. From the website, handling outlier and other issue are more focused on daily data.
For the last question on median/75th percentile value are very close to the max. I create below flowchart to provides a clear visual representation of the decision-making process in the classification of NTL data, accounting for different scenarios based on the data characteristics.
graph TD
A[Download NTL data] --> B[Gunzip data]
B --> C[Clip using country boundary]
C --> D[Check max_value]
D -->|max_value > 0.05| E[Check median and 75th percentile]
D -->|max_value <= 0.05| F[Use modified classification scheme]
E -->|Difference > 5% of max_value| G[Use original classification scheme]
E -->|Difference <= 5% of max_value| F
G -->|Use median and percentile| H[Classify NTL data]
F -->|Use max_value| H
H --> I[Output classified NTL map]
style A fill:#f9f,stroke:#333,stroke-width:2px
style B fill:#bbf,stroke:#333,stroke-width:2px
style C fill:#bbf,stroke:#333,stroke-width:2px
style D fill:#fbb,stroke:#333,stroke-width:2px
style E fill:#fbb,stroke:#333,stroke-width:2px
style F fill:#bfb,stroke:#333,stroke-width:2px
style G fill:#bfb,stroke:#333,stroke-width:2px
style H fill:#fbf,stroke:#333,stroke-width:2px
style I fill:#ff9,stroke:#333,stroke-width:2px
This diagram illustrates the process as follows:
Original Classification Scheme:
GEEST Class | Description | NTL Value Range |
---|---|---|
0 | No Access | 0 - 0.05 |
1 | Very Low | > 0.05 - 0.25 * Median |
2 | Low | > 0.25 Median - 0.5 Median |
3 | Moderate | > 0.5 * Median - Median |
4 | High | > Median - 75th percentile |
5 | Very High | > 75th percentile |
Modified Classification Scheme:
GEEST Class | Description | NTL Value Range |
---|---|---|
0 | No Light | 0 |
1 | Very Low | > 0 - 0.2 * max_value |
2 | Low | > 0.2 max_value - 0.4 max_value |
3 | Moderate | > 0.4 max_value - 0.6 max_value |
4 | High | > 0.6 max_value - 0.8 max_value |
5 | Highest | > 0.8 * max_value - max_value |
@dragosgontariu Test Results: Perceived Safety Value = 80
Street lights (points)
Nighttime lights (raster)
@mvmaltitz I've obtained different results for the NTL classification for LCA. Here are my findings:
The resulting classification map is shown below:
You can review the code I used to generate this map here.
Could you please compare these results with your findings? I'm particularly interested in understanding any discrepancies in the classification thresholds or the final map output.
@mvmaltitz @osundwajeff please ensure that the descriptive text on the tab is updated and explains briefly the processing of each data type. Also, please rename this factor "Safety" as per the indications in the Indicators excel --> remove "safe urban design"
@bennyistanto Hi Benny I have modified the classification algorithm as per your instructions.
Input:
Output:
(Data download source ) Input:
Output:
Code snippet:
# Get valid data (non-NaN values)
valid_data = ntl_data[~np.isnan(ntl_data)]
max_value = np.max(valid_data)
median = np.median(valid_data)
percentile_75 = np.percentile(valid_data, 75)
print(f"Max Value: {max_value:.6f}")
print(f"Median: {median:.6f}")
print(f"75th Percentile: {percentile_75:.6f}")
# Determine classification scheme
if max_value <= 0.05 or (max_value - percentile_75) <= 0.05 * max_value:
print("Using max_value classification scheme")
classification = np.full_like(ntl_data, 0, dtype=np.uint8)
classification[(ntl_data > 0) & (ntl_data <= 0.2 * max_value)] = 1
classification[(ntl_data > 0.2 * max_value) & (ntl_data <= 0.4 * max_value)] = 2
classification[(ntl_data > 0.4 * max_value) & (ntl_data <= 0.6 * max_value)] = 3
classification[(ntl_data > 0.6 * max_value) & (ntl_data <= 0.8 * max_value)] = 4
classification[ntl_data > 0.8 * max_value] = 5
else:
print("Using original classification scheme")
classification = np.full_like(ntl_data, 0, dtype=np.uint8)
classification[(ntl_data > 0.05) & (ntl_data <= 0.25 * median)] = 1
classification[(ntl_data > 0.25 * median) & (ntl_data <= 0.5 * median)] = 2
classification[(ntl_data > 0.5 * median) & (ntl_data <= median)] = 3
classification[(ntl_data > median) & (ntl_data <= percentile_75)] = 4
classification[ntl_data > percentile_75] = 5
# Set NaN values in the original data to 255
classification[np.isnan(ntl_data)] = 255
# Update metadata for output
out_meta.update(dtype=rasterio.uint8, nodata=255)
Using the same approach in both cases using different but very similar input data, the results appear to be significantly different.
Hey Hennie @javaftw,
I've figured out why our maps look different. The classification patterns we're seeing on the map are different because we're getting different statistical values, especially for the median
and 75_percentile
.
Here's what I'm guessing you got:
And here's what I'm seeing:
The reason for this difference? It's probably how we're dealing with those pesky nodata
values.
When we clip the raster with the country boundary, we might end up with areas outside the boundary that aren't properly marked as nodata
. If these areas are included as zeros in our calculations, it could potentially lower our median
and 75_percentile
values, depending on how many of these zero values there are compared to the valid data points inside the country boundary.
I took a look at the GitHub code (https://github.com/kartoza/GEEST/blob/dev/gender_indicator_tool/gender_indicator_tool.py, lines 3640-3685), and I noticed it's following the Colab script. The thing is, Colab handles NaN values correctly during clipping, but in the gender_indicator_tool.py, especially in the gdal:cliprasterbymasklayer part, nodata is set to 0.
I tried to make the Colab script work in the PyQGIS console by tweaking how we set the NODATA value. The goal was to make sure the areas outside the boundary are properly recognized as NODATA. I've got the full code below if you want to take a look.
https://gist.github.com/bennyistanto/7ec367ba090468ca0e6e888dd13f7f3e
After testing in both Colab and PyQGIS with this updated script, I'm getting the same stats for min
, max
, median
, and 75_percentile
.
QGIS test
import numpy as np
from qgis.core import QgsRasterLayer
# File path to your raster
raster_path = r'D:\temp\geest\factor_safety\ntl\lca_VNL_npp_2023_global_vcmslcfg_v2_c202402081600.average.dat.tif'
# Load the raster layer
raster_layer = QgsRasterLayer(raster_path, "NTL")
# Check if the raster layer is valid
if not raster_layer.isValid():
print("Error: Raster failed to load.")
else:
provider = raster_layer.dataProvider()
extent = raster_layer.extent()
cols = raster_layer.width()
rows = raster_layer.height()
block = provider.block(1, extent, cols, rows)
# Convert raster block to numpy array
data = np.array([[block.value(i, j) for j in range(cols)] for i in range(rows)])
# Handle nodata values (non-NaN)
nodata = provider.sourceNoDataValue(1)
valid_data = data[~np.isnan(data) & (data != nodata)]
if valid_data.size > 0:
# Calculate statistics
min_value = np.min(valid_data)
max_value = np.max(valid_data)
median = np.median(valid_data)
percentile_75 = np.percentile(valid_data, 75)
print(f"Min Value: {min_value:.6f}")
print(f"Max Value: {max_value:.6f}")
print(f"Median: {median:.6f}")
print(f"75th Percentile: {percentile_75:.6f}")
else:
print("No valid data found.")
Colab test
import numpy as np
import rasterio
def analyze_raster(raster_path):
# Open the raster file
with rasterio.open(raster_path) as src:
data = src.read(1) # Read the first band
# Retrieve the nodata value from the raster metadata
nodata = src.nodata
# Get valid data (non-NaN values and not nodata)
valid_data = data[~np.isnan(data) & (data != nodata)]
if valid_data.size > 0:
# Calculate statistics
min_value = np.min(valid_data)
max_value = np.max(valid_data)
median = np.median(valid_data)
percentile_75 = np.percentile(valid_data, 75)
print(f"Min Value: {min_value:.6f}")
print(f"Max Value: {max_value:.6f}")
print(f"Median: {median:.6f}")
print(f"75th Percentile: {percentile_75:.6f}")
else:
print("No valid data found.")
# Path to your raster file
raster_path = f'{dir}/ntl/lca_VNL_npp_2023_global_vcmslcfg_v2_c202402081600.average.dat.tif'
# Run the analysis
analyze_raster(raster_path)
Both script returned the same value:
What do you think? Does this help explain the differences we're seeing?
@timlinux
Safety trigger a warning on OGR when open the NTL data, although the data is available and loaded into map, dragged from Explorer window.
2024-09-12T21:05:41 WARNING Cannot open D:\temp\geest\geest_qgis_templates\layers\iso3\fji_input\fji_place_saf_ntl_2023_8859_clipped.tif ().()
After Execute the button, the label next to the button appear with text:
Processing failed: Unable to execute algorithm
Could not load source layer for INPUT: temp\clippedClassified.tif not found
See below the python warning
2024-09-12T21:09:16 WARNING warning:C:\PROGRA~1\QGIS33~1.4\apps\Python39\lib\site-packages\osgeo\ogr.py:596: FutureWarning: Neither ogr.UseExceptions() nor ogr.DontUseExceptions() has been explicitly called. In GDAL 4.0, exceptions will be enabled by default.
warnings.warn(
traceback: File "C:\Users/benny/AppData/Roaming/QGIS/QGIS3\profiles\default/python/plugins\gender_indicator_tool\gender_indicator_tool.py", line 3304, in SAFRasterizerDelegator
self.SAFnightTimeLights_v2()
File "C:\Users/benny/AppData/Roaming/QGIS/QGIS3\profiles\default/python/plugins\gender_indicator_tool\gender_indicator_tool.py", line 3740, in SAFnightTimeLights_v2
processing.run(
File "C:\PROGRA~1/QGIS33~1.4/apps/qgis-ltr/./python/plugins\processing\tools\general.py", line 109, in run
return Processing.runAlgorithm(algOrName, parameters, onFinish, feedback, context)
File "C:\PROGRA~1/QGIS33~1.4/apps/qgis-ltr/./python/plugins\processing\core\Processing.py", line 176, in runAlgorithm
ret, results = execute(alg, parameters, context, feedback, catch_exceptions=False)
File "C:\PROGRA~1/QGIS33~1.4/apps/qgis-ltr/./python/plugins\processing\gui\AlgorithmExecutor.py", line 70, in execute
results, ok = alg.run(parameters, context, feedback, {}, False)
File "C:\PROGRA~1/QGIS33~1.4/apps/qgis-ltr/./python/plugins\processing\algs\gdal\GdalAlgorithm.py", line 132, in processAlgorithm
commands = self.getConsoleCommands(parameters, context, feedback, executing=True)
File "C:\PROGRA~1/QGIS33~1.4/apps/qgis-ltr/./python/plugins\processing\algs\gdal\ClipRasterByMask.py", line 170, in getConsoleCommands
maskLayer, maskLayerName = self.getOgrCompatibleSource(self.MASK, parameters, context, feedback, executing)
File "C:\PROGRA~1/QGIS33~1.4/apps/qgis-ltr/./python/plugins\processing\algs\gdal\GdalAlgorithm.py", line 89, in getOgrCompatibleSource
ogr_layer_name = GdalUtils.ogrLayerName(ogr_data_path)
File "C:\PROGRA~1/QGIS33~1.4/apps/qgis-ltr/./python/plugins\processing\algs\gdal\GdalUtils.py", line 439, in ogrLayerName
ds = ogr.Open(basePath)
File "C:\PROGRA~1\QGIS33~1.4\apps\Python39\lib\site-packages\osgeo\ogr.py", line 7481, in Open
_WarnIfUserHasNotSpecifiedIfUsingExceptions()
File "C:\PROGRA~1\QGIS33~1.4\apps\Python39\lib\site-packages\osgeo\ogr.py", line 596, in _WarnIfUserHasNotSpecifiedIfUsingExceptions
warnings.warn(
And below the Processing log
2024-09-12T21:09:16 INFO Results: {'OUTPUT': 'output_e54fbcd2_4078_43f6_b086_52d0e9371988'}
2024-09-12T21:09:16 INFO gdalwarp -overwrite -of GTiff -cutline C:/TEMP/processing_JvEyfI/72efe44926e247ec969f00d98a4ce638/MASK.gpkg -cl MASK -crop_to_cutline -dstnodata -3.4028234663852886e+38 temp\tempClassified.tif temp\clippedClassified.tif
2024-09-12T21:09:16 INFO GDAL command:
2024-09-12T21:09:16 INFO gdalwarp -overwrite -of GTiff -cutline C:/TEMP/processing_JvEyfI/72efe44926e247ec969f00d98a4ce638/MASK.gpkg -cl MASK -crop_to_cutline -dstnodata -3.4028234663852886e+38 temp\tempClassified.tif temp\clippedClassified.tif
2024-09-12T21:09:16 INFO GDAL command output:
2024-09-12T21:09:17 CRITICAL Process returned error code 1
2024-09-12T21:09:17 INFO Results: {'OUTPUT': 'temp\\clippedClassified.tif'}
2024-09-12T21:09:20 INFO Results: {'OUTPUT': 'temp/countryUTMLayerBuf.shp'}
2024-09-12T21:09:22 CRITICAL Unable to execute algorithm
Could not load source layer for INPUT: temp\clippedClassified.tif not found
2024-09-12T21:09:22 CRITICAL Warning 1: Ring Self-intersection at or near point 2617160.3599800956 -2438488.8009277876
2024-09-12T21:09:22 CRITICAL ERROR 1: Cutline polygon is invalid.
And OGR log
2024-09-12T21:05:41 WARNING Cannot open D:\temp\geest\geest_qgis_templates\layers\iso3\fji_input\fji_place_saf_ntl_2023_8859_clipped.tif ().()
2024-09-12T21:09:16 WARNING Cannot open EPSG:8859 ().()
See tickets #134 and #135 for full testing, results and input data
Mapillary
This indicator should be calculated by creating 20-meter buffers around streetlights. Raster cells in which 80-100% of their area intersects with the buffers should receive a score of 5. Rasters with 60-79% intersection should be scored as 4, 40-59% as 3, 20-39% as 2, and 1-19% as 1. Areas without any overlap with streetlight buffers should be scored as 0. NOTE: Use nighttime lights only if street lights are absent.
Nighttime Lights
Nighttime lights observation data from a Low light imaging sensor part of the Visible and Infrared Imaging Suite (VIIRS) Day Night Band (DNB). Unit - (average, average-masked, max, median, median-masked, min) nW/cm2/sr, CRS - EPSG:4326 (Geographic Latitude/Longitude), Resolution - 15 arc second (~500m at the Equator)