Closed mvl22 closed 4 months ago
Just to flag up that this issue is holding up various refactoring work, as I can't generalise the value lists yet.
@Robinlovelace assigned mvl22 2 weeks ago
I don't think this assignment to me is correct - this is an upstream data issue that someone in the data team needs to address.
Re-assigned. @mem48 do you know where in the codebase these values come from?
Its from the SIMD data
@mvl22 how do you know that
the absence of broadband data results in a data value 0
These are the raw values:
table(zones$broadband)
0% 1% 10% 100% 11% 12% 13% 14% 15% 16% 17% 18% 19% 2% 20% 21%
4451 393 54 6 43 53 52 32 39 33 33 39 34 217 28 31
22% 23% 24% 25% 26% 27% 28% 29% 3% 30% 31% 32% 33% 34% 35% 36%
28 25 16 29 22 21 17 17 131 19 16 16 23 13 23 25
37% 38% 39% 4% 40% 41% 42% 43% 44% 45% 46% 47% 48% 49% 5% 50%
12 16 21 121 21 17 12 15 17 16 10 15 19 12 85 7
51% 52% 53% 54% 55% 56% 57% 58% 59% 6% 60% 61% 62% 63% 64% 65%
11 7 8 19 10 12 9 10 9 69 8 11 6 6 6 10
66% 67% 68% 69% 7% 70% 71% 72% 73% 74% 75% 76% 77% 78% 79% 8%
6 10 8 14 67 10 5 9 9 8 5 4 6 5 8 65
80% 81% 82% 83% 84% 85% 86% 87% 88% 89% 9% 90% 91% 93% 94% 95%
2 7 11 6 2 7 6 7 5 2 55 2 7 1 1 2
96% 97% 98%
3 2 1
As shown here the results do look a bit strange but according to @mem48 there's no evidence that they are wrong:
Reprex and closing:
u = "https://statistics.gov.scot/downloads/cube-table?uri=http%3A%2F%2Fstatistics.gov.scot%2Fdata%2Fscottish-index-of-multiple-deprivation---broadband-access-indicator"
b = readr::read_csv(u)
#> Rows: 6976 Columns: 7
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (5): FeatureCode, FeatureName, FeatureType, Measurement, Units
#> dbl (2): DateCode, Value
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(b)
#> # A tibble: 6 × 7
#> FeatureCode FeatureName FeatureType DateCode Measurement Units Value
#> <chr> <chr> <chr> <dbl> <chr> <chr> <dbl>
#> 1 S01006511 Culter - 06 2011 Data Zone 2019 Percent Percent o… 5.3
#> 2 S01006512 Culter - 07 2011 Data Zone 2019 Percent Percent o… 53.8
#> 3 S01006506 Culter - 01 2011 Data Zone 2019 Percent Percent o… 10.5
#> 4 S01006507 Culter - 02 2011 Data Zone 2019 Percent Percent o… 1.36
#> 5 S01006510 Culter - 05 2011 Data Zone 2019 Percent Percent o… 0.31
#> 6 S01006528 Garthdee - 03 2011 Data Zone 2019 Percent Percent o… 1.78
summary(b)
#> FeatureCode FeatureName FeatureType DateCode
#> Length:6976 Length:6976 Length:6976 Min. :2019
#> Class :character Class :character Class :character 1st Qu.:2019
#> Mode :character Mode :character Mode :character Median :2019
#> Mean :2019
#> 3rd Qu.:2019
#> Max. :2019
#> Measurement Units Value
#> Length:6976 Length:6976 Min. : 0.000
#> Class :character Class :character 1st Qu.: 0.000
#> Mode :character Mode :character Median : 0.000
#> Mean : 7.441
#> 3rd Qu.: 3.803
#> Max. :100.000
sum(b$Value == 0)
#> [1] 3971
Created on 2024-02-01 with reprex v2.1.0
What is the null value now?
Is 0 now in place meaning genuinely none?
Please let me know when this version is reflected in a build uploaded to the tile server somewhere (and the URL), and I will update the definitions in the refactor code branch to reflect this change.
There are no null values as per content above. 0 means and always meant 0 as far as I can tell. Why did you think 0 meant NA @mvl22 ?
Why did you think 0 meant NA?
I thought this came up in the opening meeting we had. Malcolm, according to my recollection, was explaining that the value of 0.01 was having to be used in the code here because the real zero meant void data:
https://github.com/nptscot/nptscot.github.io/blob/dev/src/datasets.js#L302
I'm not sure what the issue is with on the backend. As far as I can see we're not modifying anything, this is faithful to the raw data. Happy to re-open if this is a confirmed issue but having looked at the data I cannot see any issue with the data provided to the front-end so closing for now until there's a clearly articulated ask. My ask: what change are you hoping to see in which file so we can confirm when this is done?
Currently the absence of broadband data results in a data value
0
, but this is within the range of 0>100 so cannot be distinguished from real data, and the real data has to use 0.01:https://github.com/nptscot/nptscot.github.io/blob/7a1f11928f0b296037644f11cf0a27ecafb6ebd2/js/layer_control.js#L206-L207
This should be changed so that voids use a known out-of-range value, e.g.
-9999
or whatever you wish to standardise on.Currently the website code has to have special handling to deal with this inconsistency.
(This issue became more overt as a result of the refactoring work.)