humlab-sead / sead_change_control

Sane SEAD change control using Sqitch.
1 stars 0 forks source link

Duplicate abbreviation values in tbl_relative_ages #65

Open roger-mahler opened 4 years ago

roger-mahler commented 4 years ago

There is logic in the sead_bugs_import that seems to assume that the abbreviation field in tbl_relative_ages being unique. For instance, the function in RelativeAgeRepository fetches a single item based on a sole abbreviation parameter. There is no unique constraint on the field.

select abbreviation, count(*)
from tbl_relative_ages
group by abbreviation
having count(*) > 1
order by abbreviation
roger-mahler commented 1 year ago

This might be a Bugs import error similar to duplicates in taxa synonyms:

select *
from tbl_relative_ages
where abbreviation = 'Anglian'
roger-mahler commented 3 months ago

Some of the duplicates, but not all, orginates from Bugs. Several existing, older rows lack C14 age range, which might be one reason for the duplicates. Another problem is that the age range in Bugs is not added to the new records in the relative ages table.

select ages.relative_age_id, bugs.sead_reference_id, ages.abbreviation,
       ages.relative_age_name, c14_age_older, c14_age_younger,
       bugs.bugs_identifier, bugs.translated_compressed_data
from tbl_relative_ages as ages
left join bugs_import.bugs_trace as bugs
  on bugs.sead_table = 'tbl_relative_ages'
 and bugs.sead_reference_id = ages.relative_age_id
where ages.abbreviation in (
    select abbreviation
    from tbl_relative_ages
    group by abbreviation
    having count(*) > 1 
)
order by abbreviation;