Open stschiff opened 2 years ago
I'm slowly crawling out of my hole and thought I quickly take a peek into this. Dana and I concluded back then for #25 that there is unfortunately a lot of d.
in the mix. This might have changed now, so let's see. c.
is trivial (although I think there is no automatic way to find these samples, right?), so let's check a.
and b.
a.
should be an impossible state of the system, so it would surprise me if it exists:
I checked anyway:
janno <- poseidonR::read_janno("~/agora/published_data/")
### If there are entries in the C14-type columns, put Date_Type to C14.
janno_with_actual_C14_dates <- janno %>% dplyr::filter(
# do not include dates for which applies
!purrr::map_lgl(Date_C14_Uncal_BP, \(x) {
is.null(x) || # date is NULL
if (length(x) == 1) { # if there is exactly one date value
is.na(x) # date is NA
} else {
FALSE
}
})
)
janno_with_actual_C14_dates %>% nrow # 3606
janno %>% dplyr::filter(Date_Type == "C14") %>% nrow # 3607
janno_with_actual_C14_dates %>%
dplyr::filter(is.na(Date_Type) | Date_Type != "C14") %>% nrow() # 0
So I think such a sample does indeed not exist. b.
is a lot more likely.
### If there are entries in the calibrated columns, but not in the C14-columns, put Date_Type to contextual.
janno_with_result_dates <- janno %>% dplyr::filter(
!is.na(janno$Date_BC_AD_Median)
)
janno_potentially_contextual <- dplyr::anti_join(
janno_with_result_dates,
janno_with_actual_C14_dates,
by = "Poseidon_ID"
)
janno_potentially_contextual %>%
dplyr::filter(is.na(Date_Type) | Date_Type != "contextual") %>%
nrow # 840
OK! So we could automatically fill these 840 (826 from 2021_PattersonNature) with contextual
. I fear this will often be factually incorrect, but it makes our DB consistent. We should also make sure that b.
is caught by the validation and can not emerge any more in the future.
Btw. my brain is still pretty mushy so take this with a grain of salt.
OK, good catch that 826 of the missing date infos with calibrated dates are from Patterson. I think we should then open a separate issue to fill in the uncalibrated dates for these, as I think they must have C14-dated most if not all of their samples.
Lots of packages contain missing
Date_Type
s in the Janno file. In my, a lot of those we should be able to fill easily:a. If there are entries in the C14-type columns, put
Date_Type
toC14
. b. If there are entries in the calbrated columns, but not in the C14-columns, putDate_Type
tocontextual
. c. If it's modern samples, put tomodern
. d. If the sample is ancient, but there is no date at all, keep atn/a
for now, but of course those we should anyway also fill soon, at least as a contextual range, which should always be possible from a look into the paper.