Closed joeroe closed 3 years ago
I think so, yes. Good spotting! @apalmisano82 is one of our most reliable data contributors, actually. He recently made us aware of yet another valuable paper with data here: https://zenodo.org/record/4322979
We already have one of his datasets in c14bazAAR and I wonder what's the best way to structure these datasets in the future. We would probably have to add a parser function for each paper? How should we call them? We'll probably accumulate even more duplication like this.
Or did you maybe consider to collect all your data across different papers in an open repository, @apalmisano82? Maybe something like @dirkseidensticker maintains for his Africa projects. Could simplify your data management and would be perfect for us :+1:
Two, I think: there's also emedyd, which come to think of it is almost fully superseded by the QSR paper:
# https://zenodo.org/record/4322979
qsr <- readr::read_csv("Palmisano_etal_data_and_code/csv/dates.csv")
emedyd <- c14bazAAR::get_c14data("emedyd")
c14baz <- dplyr::filter(c14bazAAR::get_c14data("all"), sourcedb != "emedyd")
sum(!emedyd$LabID %in% qsr$LabID)
# [1] 62
not_in_qsr <- emedyd$LabID[!emedyd$LabID %in% qsr$LabID]
sum(!not_in_qsr %in% c14baz)
# [1] 62
# QSR: https://zenodo.org/record/4322979
qsr <- readr::read_csv("~/downloads/Palmisano_etal_data_and_code/csv/dates.csv")
data(emedyd, package = "rcarbon")
nerd <- readr::read_csv("https://raw.githubusercontent.com/apalmisano82/NERD/main/nerd.csv")
all(qsr$LabID == nerd$LabID, na.rm = TRUE)
#> [1] TRUE
There is still the small number of dates (62) in emedyd that aren't in NERD, but I suspect most of these will be covered by other databases (unfortunately can't verify right now this because of #134), so maybe it's time to deprecate get_emedyd()
?
emedyd[!emedyd$LabID %in% nerd$LabID,]
#> LabID CRA Error Material Species
#> 87953 SMU-2373 14500 190 charcoal <NA>
#> 74426 OxA-2142 15160 190 charcoal <NA>
#> 80387 OxA-869 13260 200 charcoal <NA>
#> 68840 ODTU-2 9510 100 charcoal <NA>
#> 18547 DRI-3255 8755 111 <NA> <NA>
#> 49015 KIA-38007 9065 35 bone <NA>
#> 74429 OxA-2143 16230 200 charcoal <NA>
#> 60338 Ly-2809 9835 55 grain Cerealia
#> 80158 OxA-8407 15860 100 charcoal <NA>
#> 80772 OxA-9264 15920 100 charcoal <NA>
#> 80773 OxA-9265 16740 100 charcoal <NA>
#> 80774 OxA-9266 16750 90 charcoal <NA>
#> 74687 OxA-22273 15890 90 charcoal Chenopodiaceae
#> 74688 OxA-22274 15770 80 charcoal dicot
#> 74689 OxA-22275 16145 75 charcoal dicot
#> 74693 OxA-22287 15980 60 charcoal Chenopodiaceae
#> 74694 OxA-22288 16275 60 charcoal Chenopodiaceae
#> 74695 OxA-22289 16300 65 charcoal dicot
#> 74696 OxA-22290 16200 65 charcoal Chenopodiaceae
#> 85140 Q-3072 9840 120 bone <NA>
#> 85141 Q-3073 10620 125 bone <NA>
#> 85142 Q-3074 12200 140 bone <NA>
#> 74061 OxA-20552 15750 75 charcoal <NA>
#> 57072 Ly-11622 16560 70 charcoal <NA>
#> 86741 RT-15076 8080 90 <NA> <NA>
#> 86688 RT-1246 15550 130 charcoal <NA>
#> 78009 OxA-5177 15460 160 charcoal <NA>
#> 78010 OxA-5178 16420 180 charcoal <NA>
#> 78011 OxA-5179 16440 160 charcoal <NA>
#> 84441 Pta-2158 14130 160 charcoal <NA>
#> 84442 Pta-2159 13390 120 charcoal <NA>
#> 43517 I-7031 15460 200 <NA> <NA>
#> 84489 Pta-3403 16100 150 eggshell Struthio camelus
#> 84507 Pta-3702 15800 160 eggshell Struthio camelus
#> 86673 RT-1072N 16200 170 <NA> <NA>
#> 97082 TO-987 11170 100 bone Gazella
#> 97083 TO-989 13110 130 bone <NA>
#> 97084 TO-991 14850 160 bone <NA>
#> 60333 Ly-2805 9705 60 seeds <NA>
#> 60334 Ly-2806 9690 60 seeds <NA>
#> 60335 Ly-2807 9705 55 seeds <NA>
#> 60337 Ly-2808 9685 55 seeds <NA>
#> 60411 Ly-2860 9185 55 organic matter <NA>
#> 67764 NUT-22023 7670 45 charcoal <NA>
#> 67765 NUT-22024 7730 80 charcoal <NA>
#> 67766 NUT-22106 8660 100 charcoal <NA>
#> 67767 NUT-22109 8390 50 charcoal <NA>
#> 60273 Ly-2756 9235 45 charcoal <NA>
#> 61182 Ly-3465 9220 45 seeds <NA>
#> 61183 Ly-3466 9020 45 charcoal <NA>
#> 61184 Ly-3467 9170 40 charcoal <NA>
#> 61181 Ly-3464 9445 45 seed <NA>
#> 10190 Beta-57898 9010 100 sediment <NA>
#> 76323 OxA-2835 15190 130 charcoal <NA>
#> 76326 OxA-2838 15050 160 charcoal <NA>
#> 76329 OxA-2841 15730 130 charcoal <NA>
#> 76353 OxA-2870 15450 130 charcoal <NA>
#> 108821 Wk-7005 14052 94 charcoal <NA>
#> 78063 OxA-524 15520 200 charcoal <NA>
#> 78073 OxA-525 16010 200 charcoal <NA>
#> 61644 Ly-3911 11970 60 charcoal <NA>
#> 61645 Ly-3912 11860 60 charcoal <NA>
#> SiteName Country Longitude Latitude Region
#> 87953 Arabi I, Wadi Feiran EG 33.4990 28.7800 1
#> 74426 Azariq 13, W Negev IL 34.4167 30.9500 1
#> 80387 Azraq 17 JO 35.0105 29.5269 1
#> 68840 Cayonu TR 39.7264 38.2164 2
#> 18547 Ghuwayr 1 JO 35.5061 30.6231 1
#> 49015 Gobekli Tepe TR 38.9225 37.2231 2
#> 74429 Hamifgash IV IL 34.5833 31.1833 1
#> 60338 Jerf el Ahmar SY 38.2083 36.3917 2
#> 80158 Karain Magarasi TR 30.5708 37.0776 3
#> 80772 Karain Magarasi TR 30.5708 37.0778 3
#> 80773 Karain Magarasi TR 30.5708 37.0778 3
#> 80774 Karain Magarasi TR 30.5708 37.0778 3
#> 74687 Kharaneh IV JO 36.4542 31.7237 1
#> 74688 Kharaneh IV JO 36.4542 31.7237 1
#> 74689 Kharaneh IV JO 36.4542 31.7237 1
#> 74693 Kharaneh IV JO 36.4542 31.7237 1
#> 74694 Kharaneh IV JO 36.4542 31.7237 1
#> 74695 Kharaneh IV JO 36.4542 31.7237 1
#> 74696 Kharaneh IV JO 36.4542 31.7237 1
#> 85140 Kharaneh IV JO 36.4500 31.7300 1
#> 85141 Kharaneh IV JO 36.4500 31.7300 1
#> 85142 Kharaneh IV JO 36.4500 31.7300 1
#> 74061 Moghr El Ahwal Cave 3 LB 35.8824 34.2846 1
#> 57072 Mureybet SY 38.0906 36.0683 2
#> 86741 Nahal Issaron IL 35.0300 29.9000 1
#> 86688 Ohalo II IL 35.5700 32.7138 1
#> 78009 Okuzini Magarasi TR 30.5760 37.0890 3
#> 78010 Okuzini Magarasi TR 30.5760 37.0890 3
#> 78011 Okuzini Magarasi TR 30.5760 37.0890 3
#> 84441 Qadesh Barnea 8 EG 34.4220 30.6480 1
#> 84442 Qadesh Barnea 8 EG 34.4220 30.6480 1
#> 43517 Rakefet Cave IL 35.0725 32.6547 1
#> 84489 Shunera 16 IL 34.6000 30.9500 1
#> 84507 Shunera 16 IL 34.6000 30.9500 1
#> 86673 Shunera 16 IL 34.6000 30.9500 1
#> 97082 Tabaqat al-Buma JO 35.7100 32.5300 1
#> 97083 Tabaqat al-Buma JO 35.7100 32.5300 1
#> 97084 Tabaqat al-Buma JO 35.7100 32.5300 1
#> 60333 Tell 'Abr 3 SY 38.0864 36.6819 2
#> 60334 Tell 'Abr 3 SY 38.0864 36.6819 2
#> 60335 Tell 'Abr 3 SY 38.0864 36.6819 2
#> 60337 Tell 'Abr 3 SY 38.0864 36.6819 2
#> 60411 Tell Ain el-Kerkh SY 36.4657 35.8196 2
#> 67764 Tell Ain el-Kerkh SY 36.4657 35.8196 2
#> 67765 Tell Ain el-Kerkh SY 36.4657 35.8196 2
#> 67766 Tell Ain el-Kerkh SY 36.4657 35.8196 2
#> 67767 Tell Ain el-Kerkh SY 36.4657 35.8196 2
#> 60273 Tell Aswad SY 36.5500 33.4042 1
#> 61182 Tell Aswad SY 36.5500 33.4042 1
#> 61183 Tell Aswad SY 36.5500 33.4042 1
#> 61184 Tell Aswad SY 36.5500 33.4042 1
#> 61181 Tell Dja'de el-Mughara SY 38.1833 36.3833 2
#> 10190 Tor al-Tareeq (WHS 1065) JO 35.9200 30.8700 1
#> 76323 Urkanar-Rub IIa PS 35.4300 32.0600 1
#> 76326 Urkanar-Rub IIa PS 35.4300 32.0600 1
#> 76329 Urkanar-Rub IIa PS 35.4300 32.0600 1
#> 76353 Wadi Fazael 10/11 PS 35.4330 32.0330 1
#> 108821 Wadi Hisban 2 JO 35.7000 31.8200 1
#> 78063 Wadi Jilat 6 JO 36.4640 31.5220 1
#> 78073 Wadi Jilat 6 JO 36.4640 31.5220 1
#> 61644 Zaquma JO 35.6816 32.1867 1
#> 61645 Zaquma JO 35.6816 32.1867 1
Created on 2021-04-05 by the reprex package (v1.0.0)
Following on from the above, all but 13 of the 62 lab IDs from emedyd that are not in NERD are already covered by other databases:
diff <- emedyd[!emedyd$LabID %in% nerd$LabID,]
everything <- c14bazAAR::get_all_dates()
everything <- everything[everything$sourcedb != "emedyd",]
# Lab IDs from emedyd that aren't in NERD or any other database
diff[!diff$LabID %in% everything$labnr,]
#> LabID CRA Error Material Species SiteName
#> 68840 ODTU-2 9510 100 charcoal <NA> Cayonu
#> 74687 OxA-22273 15890 90 charcoal Chenopodiaceae Kharaneh IV
#> 74688 OxA-22274 15770 80 charcoal dicot Kharaneh IV
#> 74689 OxA-22275 16145 75 charcoal dicot Kharaneh IV
#> 74693 OxA-22287 15980 60 charcoal Chenopodiaceae Kharaneh IV
#> 74694 OxA-22288 16275 60 charcoal Chenopodiaceae Kharaneh IV
#> 74695 OxA-22289 16300 65 charcoal dicot Kharaneh IV
#> 74696 OxA-22290 16200 65 charcoal Chenopodiaceae Kharaneh IV
#> 74061 OxA-20552 15750 75 charcoal <NA> Moghr El Ahwal Cave 3
#> 57072 Ly-11622 16560 70 charcoal <NA> Mureybet
#> 61181 Ly-3464 9445 45 seed <NA> Tell Dja'de el-Mughara
#> 61644 Ly-3911 11970 60 charcoal <NA> Zaquma
#> 61645 Ly-3912 11860 60 charcoal <NA> Zaquma
And of these:
OxA-*
dates from KHIV are in IntChron, so should be retrieved when I finally get around do doing a PR for #115 Ly-3*
dates are actually in NERD, just recoded as Lyon-3*
ODTU-2
is in NERD as ODTÜ-2
That just leaves Ly-11622
as the only truly missing one. Presumably it was omitted from NERD because it is outside their date range, and indeed it is an obvious outlier for PPNA Mureybet.
Thank you very much for doing the research, @joeroe!!
I generally think removing get_emedyd
is a good idea - less to maintain. What do you think, @dirkseidensticker? We may have to consider that we focused on the unit dataset and not so much individual date so far with our decentralized approach. But I think in this case NERD is designed specifically as a superset of previous datasets. So it might be fair to deprecate the old parser.
I would replace get_emedyd
with a message to switch to get_nerd
.
Hi all,
I think that it makes sense tremoving emedyd. NERD is much cleaner than emedyd, which has some dates not georeferenced properly and lacking standardized information.
best,
Alessio
From: Clemens Schmid @.> Sent: 25 April 2021 22:01 To: ropensci/c14bazAAR @.> Cc: Alessio Palmisano @.>; Mention @.> Subject: Re: [ropensci/c14bazAAR] Parsers for Palmisano's datasets (#120)
Thank you very much for doing the research, @joeroehttps://github.com/joeroe!!
I generally think removing get_emedyd is a good idea - less to maintain. What do you think, @dirkseidenstickerhttps://github.com/dirkseidensticker? We may have to consider that we focused on the unit dataset and not so much individual date so far with our decentralized approach. But I think in this case NERD is designed specifically as a superset of previous datasets. So it might be fair to deprecate the old parser.
I would replace get_emedyd with a message to switch to get_nerd.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ropensci/c14bazAAR/issues/120#issuecomment-826381443, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADIOWXXJT3JF6ZZW4FBT4ILTKRYJPANCNFSM4WLAAIYA.
Alright! #136 implements the change.
Ok - I consider this sufficiently solved now. Thanks to all of you!
Ok good!
Thanks
Alessio
From: Clemens Schmid @.> Sent: Saturday, May 8, 2021 11:03:07 AM To: ropensci/c14bazAAR @.> Cc: Alessio Palmisano @.>; Mention @.> Subject: Re: [ropensci/c14bazAAR] Parsers for Palmisano's datasets (#120)
Closed #120https://github.com/ropensci/c14bazAAR/issues/120.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ropensci/c14bazAAR/issues/120#event-4707031074, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADIOWXSP5BMTS4BTYLEVNOTTMT44XANCNFSM4WLAAIYA.
This paper includes 920 dates from Northern Mesopotamia and the Levant, 6,000–3,000 BP. At a rough estimate just over half would be new additions to c14bazAAR:
The data is available as a supplementary .xlsx file or a CSV in the Zenodo archive. Worth including? #2