humlab-sead / sead_bugs_import

SEAD bugs import
2 stars 0 forks source link

Analysis entities confusion #10

Closed visead closed 5 years ago

visead commented 5 years ago

Import code does not correctly match dataset-physical_sample combinations with countsheet-sample combinations for analysis entities. Something to do with a self creating function... Creates too many analysis entities, and crashes, if database contains existing data. Possible duplication of analysis entities (e.g. analysis_entity_id 285 and 286 in Roger's test import 20190517)

roger-mahler commented 5 years ago

Query that shows the problem:

select  count(distinct dataset_id) as datasets,
        count(distinct sample_group_id) as sample_groups,
        count(distinct physical_sample_id) as samples,
        count(distinct analysis_entity_id) as analysis,
        count(distinct abundance_id) as abundances
from tbl_dataset_masters m
join tbl_datasets d using (master_set_id)
join tbl_analysis_entities ae using (dataset_id)
join tbl_physical_samples s using (physical_sample_id)
join tbl_sample_groups sg using (sample_group_id)
join tbl_abundances a using (analysis_entity_id)
where m.master_name = 'Bugs database'

image

roger-mahler commented 5 years ago

Problem is caused by a defect in an internal cache that uses Java HashMap. Only mutable entities can be used as keys in a HashMap. If an attribute included in entity's hashCode changes after insert into map, then subsequent fetches fail.