humlab-sead / sead_browser_client

Online browser client for the SEAD database
2 stars 0 forks source link

Dendro dating update will affect browser visualisation #320

Open MattiasSealander opened 5 months ago

MattiasSealander commented 5 months ago

Import of the new dendro dataset has resulted in updates to the tbl_dendro_dates structures. This will affect the visualisation, due to new logic. Below follows a description of how the table has been overhauled:

New structure: dendro_date_id - PK season_id - FK (tbl_seasons) dating_uncertainty_id - FK (tbl_dating_uncertainties) dendro_lookup_id - FK (tbl_dendro_lookups) age_type_id - FK (tbl_age_types) analysis_entity_id - FK (tbl_analysis_entities) age_older ("from date") age_younger ("to date") date_updated

Old structure dendro_date_id - PK season_or_qualifier_id - Removed dating_uncertainty_id - FK (tbl_dating_uncertainties) dendro_lookup_id - FK (tbl_dendro_lookups) age_type_id - FK (tbl_age_types) analysis_entity_id - FK (tbl_analysis_entities) age_older ("from date") age_younger ("to date") error_plus - Removed error_minus - Removed error_uncertainty_id - Removed date_updated

Seasons: The old structure had seasons registered via tbl_seasons_or_qualifier. This table is gone and instead dendro_dates is related to tbl_seasons in the taxa tree of tables. An issue arose of how to manage situations where one date had two different seasons. An "internal logic" used in the new structure is that "seasons" that cover multiple seasons have been added as lookups (e.g. Summer-Winter). When an ID for multiple seasons is noted the logic is that the first season relates to the "From" date (age_older), and the second season the "To" date (age_younger). I.e. if a date in the dendro excel is registered as e.g. "S 1570 - W 1579". In the new SEAD structure this will be registered as "SW 1570 - 1579". If possible, the SEAD browser should be able to display this similarly to the original excel form.

A second logic is that the Winter season always covers two years. Winter goes from ca. September - April, meaning "W 1570" is actually "Winter 1570/1571". It is not possible to determine whether a tree was felled in one year or the other, thus two possible years. Dendrochronologists know this, other users don't. Thus it would also be good if any felling date with winter season is "translated" to show the year in the database as well as the following year. "SW 1570 - 1579" would thus be "S 1570 - W 1579/1580".

Error uncertainty: In the old structure, based on the Pilot project dataset, there were sometimes an "error" noted for dates, e.g. "1570 +/- 5". Therefore there were error fields and a foreign key stating what the error type was. The new data has already added this error to the dates, so there is no need for these fields any longer.

johanvonboer commented 5 months ago

Thank you very much for this, great write-up!

I have gotten to the point where the site works (seemingly) with the new data, but when I was going to start working on figuring out what these changes mean for the dendro graph, I figured that a good way to start would be to compare two sites, so I choose Viggesbo säteri, in the old database this is site 2000: https://browser.sead.se/site/2000

And in the new database it is this site: https://supersead.humlab.umu.se/site/3960

What puzzles me is that they look very different in multiple ways. They have different samples groups, different samples, different geographical locations (slightly) and the new version no longer has a site description. Has something gone wrong with the new import here or was it actually wrong in the old import?

MattiasSealander commented 5 months ago

I will be able to take a closer look at this (and other issues) on Wednesday. What I can say is that the pilot project data is not included in this new import. Viggesbo säteri has been sampled multiple times and includes a number of different buildings. Anton at the Dendro lab has been standardizing the sample group names to account for the fact that a building could be sampled multiple times over many years, so making sure that we don't get duplicates of a building with different names. So that can explain the different sample groups.

Site description was previosly made up of information that I was uncertain where it should be stored. As it became apparent that the large body of new data would make this awkward I have stored that information in other fields at sample group or sample level.

johanvonboer commented 5 months ago

I think that explains it completely and I don't think you need to investigate this further then, thanks!

I think it raises the question of how we name sites, and perhaps what constitues a site, but that's a separate issue.