poseidon-framework / community-archive

The Poseidon Community Archive (PCA)
https://www.poseidon-adna.org/#/archive_overview
10 stars 25 forks source link

Fill uncalibrated Dates for 2021_PattersonNature #101

Open stschiff opened 2 years ago

stschiff commented 2 years ago

As Clemens found in #97, package 2021_PattersonNature has 826 entries with calibrated dates, but no uncalibrated dates. We should get this filled in. @93Boy could you have a look whether there is a supplementary Table in the paper that lists the raw uncalibrated dates?

93Boy commented 2 years ago

I will look into this

93Boy commented 2 years ago

I have checked the supplementary documents of this publication. It only has calibrated radiocarbon dates. @nevrome Is it possible to use your age_string _parser script to reverse these calibrated data?

nevrome commented 2 years ago

Very interesting question. I think radiocarbon calibration is not a bijective operation, so you can not go back from a calibrated date to exactly one uncalibrated date. It should theoretically be possible though, to determine a possible date. Curious if this functionality is already implemented somewhere. It's further complicated by the fact that we don't have the full post-calibration probability distribution, but only a result range. Maybe I can summon @MartinHinz to get some insights into this interesting question?

This is purely academic, though. As much as I like the idea, it defies the purpose of the respective janno columns. Maybe this is a case where we could contact the authors. They must have the uncalibrated dates, after all, even if they decided (for whatever reason?!) to exclude them from the paper.

MartinHinz commented 2 years ago

That is right, because we have the uncertainties in the calibration curve only on the calendar axis, not on the uncalibrated axis. Check the following out:

cal_curve <- read.csv(
  url("https://raw.githubusercontent.com/andrewcparnell/tsunamis/master/intcal13.14c"),
  skip=11, 
  header=F)
cal_matrix <- sapply(cal_curve[,1],function(x) {
  dnorm(x,mean = cal_curve[,2],sd = cal_curve[,3])
})
my_prob_date <- dnorm(cal_curve[,1], mean = 4000, sd = 25)
my_prob_date <- my_prob_date / sum(my_prob_date)
my_cal_date <- as.vector(cal_matrix %*% my_prob_date)

my_backcal_date <- as.vector(t(cal_matrix) %*% my_cal_date)
my_backcal_date <- my_backcal_date / sum(my_backcal_date)

plot(my_prob_date, type = "l", xlim=c(4300,4400))
lines(my_backcal_date, col="red")

Resulting in this:

image

The differences in the original (black) uncalibrated distribution and the rebackcalibrated (red) distribution result from this, I guess.

nevrome commented 2 years ago

Thank you so much, Martin. Very helpful!

93Boy commented 2 years ago

Thank you so much for the explanation. Since I don't have an academic background in this I am curious about what caused this deviation. Is it because post-calibration data is given with a tolerance range?

nevrome commented 2 years ago

My understanding summarized: For each calendar age the calibration curve has exactly one value (a normal distribution) on the C14 age axis (just the very definition of a mathematical function). A value on the C14 age axis, on the other hand, potentially fits to multiple values on the calendar age axis. We can not know when exactly the sampled organism stopped ingesting C14 from the atmosphere, we only know (approximately) how much C14 the sample still has and (approximately) how the C14 content in the atmosphere was X years ago. If we have a plateau in the calibration curve, then the organism could have died during the entire duration of the plateau, given the measured amount of C14.

There's a nice wiki article to get you started. Helped me, when I was figuring this out for currycarbon. But this discussion doesn't have anything to do with this issue, so I guess we should stop going deeper into it (here).

stschiff commented 2 years ago

Interesting discussion!

Back to the problem: @93Boy could you take a look into the latest AADR release whether they there have included the uncalibrated dates? If not, I will write to Nick and David directly.

93Boy commented 2 years ago

I could not find 2021_PattersonNature in AADR v50. I checked within both published and unpublished data.

stschiff commented 2 years ago

OK, I think we should postpone this. I can write to David at some point, but I think there are more important things now, and we might also simply wait for the next AADR release. We should just keep n/a into the respective columns.