Closed SanderDevisscher closed 10 months ago
Related but not explainable from my side is the difference between:
eco-new: version currently on UAT:
Datasource = https://drive.google.com/file/d/1h6uqH2RCINiuSkFLijka6Af7CxdY2AEh/view?usp=sharing
&
eco-old: version currently on PRD:
Datasource = https://drive.google.com/file/d/1ESGwf7BsiKu71iTEGwIiMcoSdHTTUCny/view?usp=sharing
I detect no major difference in the relevant columns, I did the following checks:
eco_old <- eco_old %>% filter(afschotjaar <= 2013)
eco_new <- eco_new %>% filter(afschotjaar <= 2013)
table(eco_new$wildsoort, eco_old$wildsoort, useNA = "ifany")
table(eco_new$leeftijd_comp, eco_old$leeftijd_comp, useNA = "ifany")
table(eco_new$geslacht_comp, eco_old$geslacht_comp, useNA = "ifany")
table(eco_new$aantal_embryos, eco_old$aantal_embryos, useNA = "ifany")
table(eco_new$leeftijdscategorie_MF, eco_old$leeftijdscategorie_MF, useNA = "ifany")
table(eco_new$Leeftijdscategorie_onderkaak, eco_old$Leeftijdscategorie_onderkaak, useNA = "ifany")
table(eco_new$aantal_embryos_labo, eco_old$aantal_embryos_labo, useNA = "ifany")
table(eco_new$aantal_embryos_MF, eco_old$aantal_embryos_MF, useNA = "ifany")
table(eco_new$type_comp, eco_old$type_comp, useNA = "ifany")
table(eco_new$leeftijd_comp_bron, eco_old$leeftijd_comp_bron, useNA = "ifany")
table(eco_new$geslacht_comp_bron, eco_old$geslacht_comp_bron, useNA = "ifany")
table(eco_new$doodsoorzaak, eco_old$doodsoorzaak, useNA = "ifany")
table(eco_new$aantal_embryos_bron, eco_old$aantal_embryos_bron, useNA = "ifany")
Largest difference I detected is 16 labels that became "onbekend" instead of NA in leeftijd_comp.
There are however some "correct" changes for the years 2014 & 2015.
Probably related: Some of my tests fail because there is no data available after filtering out missing values, while previously there was some data. E.g. for Ree
> head(combinedRee[, c("leeftijdscategorie_MF", "Leeftijdscategorie_onderkaak", "afschotjaar")])
leeftijdscategorie_MF Leeftijdscategorie_onderkaak afschotjaar
1 <NA> Niet ingezameld 2023
2 <NA> Kits 2021
3 <NA> Niet ingezameld 2020
4 <NA> Kits 2018
5 <NA> Niet ingezameld 2019
6 <NA> Niet ingezameld 2017
Ok, I see ! I've fixed the issue causing leeftijdscategorie_MF to become empty (see current version on the UAT bucket). However the problem with the embryos remains and I don't see any significant differences between the old & the new data.
When running createRawData()
I do get the following warning: 626 observaties met gekend aantal embryos wordt op onbekend gezet
what triggers this ?
When running
createRawData()
I do get the following warning:626 observaties met gekend aantal embryos wordt op onbekend gezet
what triggers this ?
Observations where 'aantal_embryos_onbekend' equals TRUE
while the number of embryos is not missing.
@SanderDevisscher Should this only be done for the relevant source? So if aantal_embryos_bron
equals 'inbo' we should check whether the aantal_embryos_labo
is NA or not. If not missing while aantal_embryos_onbekend
equals TRUE we could overwrite with NA. Or has this kind of data manipulation become redundant/done at your side?
> head(rawData[rawData$aantal_embryos_onbekend & !is.na(rawData$aantal_embryos), grepl("embryo", colnames(rawData))])
aantal_embryos_onbekend aantal_embryos aantal_embryos_labo aantal_embryos_MF
4 TRUE 5 5 NA
24 TRUE 0 0 NA
38 TRUE 0 0 NA
39 TRUE 0 0 NA
75 TRUE 6 NA 6
98 TRUE 5 NA 5
aantal_embryos_bron
4 inbo
24 inbo
38 inbo
39 inbo
75 meldingsformulier
98 meldingsformulier
aantal_embryos_onbekend
was not updated on our side 😅 doing so fixed the issue 😄
Updated warning message for future use:
Warning message: 626 observaties met 'aantal_embryos_onbekend' TRUE terwijl gekend 'aantal_embryos'. Voor deze observaties wordt 'aantal_embryos' op NA (onbekend) gezet.
Describe the bug Currently ecology only contains 1 geslacht column (geslacht_comp) this causes some strange things at the FIGUUR: Gerapporteerd aantal embryo's. Mainly the number of female wild boars shot prior to 2014 seems to high compared to afterwards. I think we need to rework how the app works with geslacht and related filters.
To Reproduce Steps to reproduce the behavior:
Expected behavior I think we need to work towards this logic:
The total number of individuals per year should be based on the filters The total number to calculate the percentage of the data to should be based on geslacht_comp == "Vrouwelijk" & type_comp %in% filter choice.
Git SHA (after 0.3.1) 9286fea381ec714508f803bfc22d0806bbe02f54