ropensci / c14bazAAR

R Package - Download and Prepare C14 Dates from Different Source Databases
https://docs.ropensci.org/c14bazAAR
GNU General Public License v2.0
30 stars 12 forks source link

Parsers for Palmisano's datasets #120

Closed joeroe closed 3 years ago

joeroe commented 3 years ago

This paper includes 920 dates from Northern Mesopotamia and the Levant, 6,000–3,000 BP. At a rough estimate just over half would be new additions to c14bazAAR:

# https://doi.org/10.1371/journal.pone.0244871.s001
palmisano <- readxl::read_xlsx("journal.pone.0244871.s001.xlsx", sheet = "C14 Dataset")
c14baz <- c14bazAAR::get_c14data("all")

sum(!palmisano$LabID %in% c14baz$labnr)
#> [1] 556

The data is available as a supplementary .xlsx file or a CSV in the Zenodo archive. Worth including? #2

nevrome commented 3 years ago

I think so, yes. Good spotting! @apalmisano82 is one of our most reliable data contributors, actually. He recently made us aware of yet another valuable paper with data here: https://zenodo.org/record/4322979

We already have one of his datasets in c14bazAAR and I wonder what's the best way to structure these datasets in the future. We would probably have to add a parser function for each paper? How should we call them? We'll probably accumulate even more duplication like this.

Or did you maybe consider to collect all your data across different papers in an open repository, @apalmisano82? Maybe something like @dirkseidensticker maintains for his Africa projects. Could simplify your data management and would be perfect for us :+1:

joeroe commented 3 years ago

Two, I think: there's also emedyd, which come to think of it is almost fully superseded by the QSR paper:

# https://zenodo.org/record/4322979
qsr <- readr::read_csv("Palmisano_etal_data_and_code/csv/dates.csv")
emedyd <- c14bazAAR::get_c14data("emedyd")
c14baz <- dplyr::filter(c14bazAAR::get_c14data("all"), sourcedb != "emedyd")

sum(!emedyd$LabID %in% qsr$LabID)
# [1] 62

not_in_qsr <- emedyd$LabID[!emedyd$LabID %in% qsr$LabID]
sum(!not_in_qsr %in% c14baz)
# [1] 62
joeroe commented 3 years ago

131 adds @apalmisano82's NERD (https://github.com/apalmisano82/NERD), which in the current version is identical to the dataset from this QSR paper.

# QSR: https://zenodo.org/record/4322979
qsr <- readr::read_csv("~/downloads/Palmisano_etal_data_and_code/csv/dates.csv")
data(emedyd, package = "rcarbon")
nerd <- readr::read_csv("https://raw.githubusercontent.com/apalmisano82/NERD/main/nerd.csv")

all(qsr$LabID == nerd$LabID, na.rm = TRUE)
#> [1] TRUE

There is still the small number of dates (62) in emedyd that aren't in NERD, but I suspect most of these will be covered by other databases (unfortunately can't verify right now this because of #134), so maybe it's time to deprecate get_emedyd()?

emedyd[!emedyd$LabID %in% nerd$LabID,]
#>             LabID   CRA Error       Material          Species
#> 87953    SMU-2373 14500   190       charcoal             <NA>
#> 74426    OxA-2142 15160   190       charcoal             <NA>
#> 80387     OxA-869 13260   200       charcoal             <NA>
#> 68840      ODTU-2  9510   100       charcoal             <NA>
#> 18547    DRI-3255  8755   111           <NA>             <NA>
#> 49015   KIA-38007  9065    35           bone             <NA>
#> 74429    OxA-2143 16230   200       charcoal             <NA>
#> 60338     Ly-2809  9835    55          grain         Cerealia
#> 80158    OxA-8407 15860   100       charcoal             <NA>
#> 80772    OxA-9264 15920   100       charcoal             <NA>
#> 80773    OxA-9265 16740   100       charcoal             <NA>
#> 80774    OxA-9266 16750    90       charcoal             <NA>
#> 74687   OxA-22273 15890    90       charcoal   Chenopodiaceae
#> 74688   OxA-22274 15770    80       charcoal            dicot
#> 74689   OxA-22275 16145    75       charcoal            dicot
#> 74693   OxA-22287 15980    60       charcoal   Chenopodiaceae
#> 74694   OxA-22288 16275    60       charcoal   Chenopodiaceae
#> 74695   OxA-22289 16300    65       charcoal            dicot
#> 74696   OxA-22290 16200    65       charcoal   Chenopodiaceae
#> 85140      Q-3072  9840   120           bone             <NA>
#> 85141      Q-3073 10620   125           bone             <NA>
#> 85142      Q-3074 12200   140           bone             <NA>
#> 74061   OxA-20552 15750    75       charcoal             <NA>
#> 57072    Ly-11622 16560    70       charcoal             <NA>
#> 86741    RT-15076  8080    90           <NA>             <NA>
#> 86688     RT-1246 15550   130       charcoal             <NA>
#> 78009    OxA-5177 15460   160       charcoal             <NA>
#> 78010    OxA-5178 16420   180       charcoal             <NA>
#> 78011    OxA-5179 16440   160       charcoal             <NA>
#> 84441    Pta-2158 14130   160       charcoal             <NA>
#> 84442    Pta-2159 13390   120       charcoal             <NA>
#> 43517      I-7031 15460   200           <NA>             <NA>
#> 84489    Pta-3403 16100   150       eggshell Struthio camelus
#> 84507    Pta-3702 15800   160       eggshell Struthio camelus
#> 86673    RT-1072N 16200   170           <NA>             <NA>
#> 97082      TO-987 11170   100           bone          Gazella
#> 97083      TO-989 13110   130           bone             <NA>
#> 97084      TO-991 14850   160           bone             <NA>
#> 60333     Ly-2805  9705    60          seeds             <NA>
#> 60334     Ly-2806  9690    60          seeds             <NA>
#> 60335     Ly-2807  9705    55          seeds             <NA>
#> 60337     Ly-2808  9685    55          seeds             <NA>
#> 60411     Ly-2860  9185    55 organic matter             <NA>
#> 67764   NUT-22023  7670    45       charcoal             <NA>
#> 67765   NUT-22024  7730    80       charcoal             <NA>
#> 67766   NUT-22106  8660   100       charcoal             <NA>
#> 67767   NUT-22109  8390    50       charcoal             <NA>
#> 60273     Ly-2756  9235    45       charcoal             <NA>
#> 61182     Ly-3465  9220    45          seeds             <NA>
#> 61183     Ly-3466  9020    45       charcoal             <NA>
#> 61184     Ly-3467  9170    40       charcoal             <NA>
#> 61181     Ly-3464  9445    45           seed             <NA>
#> 10190  Beta-57898  9010   100       sediment             <NA>
#> 76323    OxA-2835 15190   130       charcoal             <NA>
#> 76326    OxA-2838 15050   160       charcoal             <NA>
#> 76329    OxA-2841 15730   130       charcoal             <NA>
#> 76353    OxA-2870 15450   130       charcoal             <NA>
#> 108821    Wk-7005 14052    94       charcoal             <NA>
#> 78063     OxA-524 15520   200       charcoal             <NA>
#> 78073     OxA-525 16010   200       charcoal             <NA>
#> 61644     Ly-3911 11970    60       charcoal             <NA>
#> 61645     Ly-3912 11860    60       charcoal             <NA>
#>                        SiteName Country Longitude Latitude Region
#> 87953      Arabi I, Wadi Feiran      EG   33.4990  28.7800      1
#> 74426        Azariq 13, W Negev      IL   34.4167  30.9500      1
#> 80387                  Azraq 17      JO   35.0105  29.5269      1
#> 68840                    Cayonu      TR   39.7264  38.2164      2
#> 18547                 Ghuwayr 1      JO   35.5061  30.6231      1
#> 49015              Gobekli Tepe      TR   38.9225  37.2231      2
#> 74429              Hamifgash IV      IL   34.5833  31.1833      1
#> 60338             Jerf el Ahmar      SY   38.2083  36.3917      2
#> 80158           Karain Magarasi      TR   30.5708  37.0776      3
#> 80772           Karain Magarasi      TR   30.5708  37.0778      3
#> 80773           Karain Magarasi      TR   30.5708  37.0778      3
#> 80774           Karain Magarasi      TR   30.5708  37.0778      3
#> 74687               Kharaneh IV      JO   36.4542  31.7237      1
#> 74688               Kharaneh IV      JO   36.4542  31.7237      1
#> 74689               Kharaneh IV      JO   36.4542  31.7237      1
#> 74693               Kharaneh IV      JO   36.4542  31.7237      1
#> 74694               Kharaneh IV      JO   36.4542  31.7237      1
#> 74695               Kharaneh IV      JO   36.4542  31.7237      1
#> 74696               Kharaneh IV      JO   36.4542  31.7237      1
#> 85140               Kharaneh IV      JO   36.4500  31.7300      1
#> 85141               Kharaneh IV      JO   36.4500  31.7300      1
#> 85142               Kharaneh IV      JO   36.4500  31.7300      1
#> 74061     Moghr El Ahwal Cave 3      LB   35.8824  34.2846      1
#> 57072                  Mureybet      SY   38.0906  36.0683      2
#> 86741             Nahal Issaron      IL   35.0300  29.9000      1
#> 86688                  Ohalo II      IL   35.5700  32.7138      1
#> 78009          Okuzini Magarasi      TR   30.5760  37.0890      3
#> 78010          Okuzini Magarasi      TR   30.5760  37.0890      3
#> 78011          Okuzini Magarasi      TR   30.5760  37.0890      3
#> 84441           Qadesh Barnea 8      EG   34.4220  30.6480      1
#> 84442           Qadesh Barnea 8      EG   34.4220  30.6480      1
#> 43517              Rakefet Cave      IL   35.0725  32.6547      1
#> 84489                Shunera 16      IL   34.6000  30.9500      1
#> 84507                Shunera 16      IL   34.6000  30.9500      1
#> 86673                Shunera 16      IL   34.6000  30.9500      1
#> 97082           Tabaqat al-Buma      JO   35.7100  32.5300      1
#> 97083           Tabaqat al-Buma      JO   35.7100  32.5300      1
#> 97084           Tabaqat al-Buma      JO   35.7100  32.5300      1
#> 60333               Tell 'Abr 3      SY   38.0864  36.6819      2
#> 60334               Tell 'Abr 3      SY   38.0864  36.6819      2
#> 60335               Tell 'Abr 3      SY   38.0864  36.6819      2
#> 60337               Tell 'Abr 3      SY   38.0864  36.6819      2
#> 60411         Tell Ain el-Kerkh      SY   36.4657  35.8196      2
#> 67764         Tell Ain el-Kerkh      SY   36.4657  35.8196      2
#> 67765         Tell Ain el-Kerkh      SY   36.4657  35.8196      2
#> 67766         Tell Ain el-Kerkh      SY   36.4657  35.8196      2
#> 67767         Tell Ain el-Kerkh      SY   36.4657  35.8196      2
#> 60273                Tell Aswad      SY   36.5500  33.4042      1
#> 61182                Tell Aswad      SY   36.5500  33.4042      1
#> 61183                Tell Aswad      SY   36.5500  33.4042      1
#> 61184                Tell Aswad      SY   36.5500  33.4042      1
#> 61181    Tell Dja'de el-Mughara      SY   38.1833  36.3833      2
#> 10190  Tor al-Tareeq (WHS 1065)      JO   35.9200  30.8700      1
#> 76323           Urkanar-Rub IIa      PS   35.4300  32.0600      1
#> 76326           Urkanar-Rub IIa      PS   35.4300  32.0600      1
#> 76329           Urkanar-Rub IIa      PS   35.4300  32.0600      1
#> 76353         Wadi Fazael 10/11      PS   35.4330  32.0330      1
#> 108821            Wadi Hisban 2      JO   35.7000  31.8200      1
#> 78063              Wadi Jilat 6      JO   36.4640  31.5220      1
#> 78073              Wadi Jilat 6      JO   36.4640  31.5220      1
#> 61644                    Zaquma      JO   35.6816  32.1867      1
#> 61645                    Zaquma      JO   35.6816  32.1867      1

Created on 2021-04-05 by the reprex package (v1.0.0)

joeroe commented 3 years ago

Following on from the above, all but 13 of the 62 lab IDs from emedyd that are not in NERD are already covered by other databases:

diff <- emedyd[!emedyd$LabID %in% nerd$LabID,]
everything <- c14bazAAR::get_all_dates()
everything <- everything[everything$sourcedb != "emedyd",]

# Lab IDs from emedyd that aren't in NERD or any other database
diff[!diff$LabID %in% everything$labnr,]
#>           LabID   CRA Error Material        Species               SiteName
#> 68840    ODTU-2  9510   100 charcoal           <NA>                 Cayonu
#> 74687 OxA-22273 15890    90 charcoal Chenopodiaceae            Kharaneh IV
#> 74688 OxA-22274 15770    80 charcoal          dicot            Kharaneh IV
#> 74689 OxA-22275 16145    75 charcoal          dicot            Kharaneh IV
#> 74693 OxA-22287 15980    60 charcoal Chenopodiaceae            Kharaneh IV
#> 74694 OxA-22288 16275    60 charcoal Chenopodiaceae            Kharaneh IV
#> 74695 OxA-22289 16300    65 charcoal          dicot            Kharaneh IV
#> 74696 OxA-22290 16200    65 charcoal Chenopodiaceae            Kharaneh IV
#> 74061 OxA-20552 15750    75 charcoal           <NA>  Moghr El Ahwal Cave 3
#> 57072  Ly-11622 16560    70 charcoal           <NA>               Mureybet
#> 61181   Ly-3464  9445    45     seed           <NA> Tell Dja'de el-Mughara
#> 61644   Ly-3911 11970    60 charcoal           <NA>                 Zaquma
#> 61645   Ly-3912 11860    60 charcoal           <NA>                 Zaquma

And of these:

That just leaves Ly-11622 as the only truly missing one. Presumably it was omitted from NERD because it is outside their date range, and indeed it is an obvious outlier for PPNA Mureybet.

nevrome commented 3 years ago

Thank you very much for doing the research, @joeroe!!

I generally think removing get_emedyd is a good idea - less to maintain. What do you think, @dirkseidensticker? We may have to consider that we focused on the unit dataset and not so much individual date so far with our decentralized approach. But I think in this case NERD is designed specifically as a superset of previous datasets. So it might be fair to deprecate the old parser.

I would replace get_emedyd with a message to switch to get_nerd.

apalmisano82 commented 3 years ago

Hi all,

I think that it makes sense tremoving emedyd. NERD is much cleaner than emedyd, which has some dates not georeferenced properly and lacking standardized information.

best,

Alessio


From: Clemens Schmid @.> Sent: 25 April 2021 22:01 To: ropensci/c14bazAAR @.> Cc: Alessio Palmisano @.>; Mention @.> Subject: Re: [ropensci/c14bazAAR] Parsers for Palmisano's datasets (#120)

Thank you very much for doing the research, @joeroehttps://github.com/joeroe!!

I generally think removing get_emedyd is a good idea - less to maintain. What do you think, @dirkseidenstickerhttps://github.com/dirkseidensticker? We may have to consider that we focused on the unit dataset and not so much individual date so far with our decentralized approach. But I think in this case NERD is designed specifically as a superset of previous datasets. So it might be fair to deprecate the old parser.

I would replace get_emedyd with a message to switch to get_nerd.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ropensci/c14bazAAR/issues/120#issuecomment-826381443, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADIOWXXJT3JF6ZZW4FBT4ILTKRYJPANCNFSM4WLAAIYA.

nevrome commented 3 years ago

Alright! #136 implements the change.

nevrome commented 3 years ago

Ok - I consider this sufficiently solved now. Thanks to all of you!

apalmisano82 commented 3 years ago

Ok good!

Thanks

Alessio


From: Clemens Schmid @.> Sent: Saturday, May 8, 2021 11:03:07 AM To: ropensci/c14bazAAR @.> Cc: Alessio Palmisano @.>; Mention @.> Subject: Re: [ropensci/c14bazAAR] Parsers for Palmisano's datasets (#120)

Closed #120https://github.com/ropensci/c14bazAAR/issues/120.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ropensci/c14bazAAR/issues/120#event-4707031074, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADIOWXSP5BMTS4BTYLEVNOTTMT44XANCNFSM4WLAAIYA.