Open Maanin00 opened 1 year ago
Problem is caused by authors
and/or title
being null. Most of the references are MAL reports.
There are in total 389 references in SEAD associated to MAL data. Only sample group or site references are considered, as given by this query:
with sample_groups as (
select distinct sg.sample_group_id, sg.site_id
from tbl_datasets d
join tbl_dataset_masters dm using (master_set_id)
join tbl_analysis_entities ae using (dataset_id)
join tbl_physical_samples ps using (physical_sample_id)
join tbl_sample_groups sg using (sample_group_id)
where master_set_id = 2
),
mal_references as (
select distinct biblio_id
from sample_groups
join tbl_sites s using (site_id)
join tbl_site_references sr using (site_id)
union
select distinct biblio_id
from sample_groups
join tbl_sample_group_references sgr using (sample_group_id)
)
select *
from tbl_biblio
join mal_references using (biblio_id)
References in the following tables are ignored:
Ignored:
The MAL report register (dated 20240116) contains 1096 entries. Only 28 of these reports exist in SEAD when matching match on full title:
select *
from tbl_biblio
join temp_mal_report_register using (title)
The MAL report register has been imported to sead_staging@staging_cluster.
With some cleaning we get 126 matches:
with sample_groups as (
select distinct sg.sample_group_id, sg.site_id
from tbl_datasets d
join tbl_dataset_masters dm using (master_set_id)
join tbl_analysis_entities ae using (dataset_id)
join tbl_physical_samples ps using (physical_sample_id)
join tbl_sample_groups sg using (sample_group_id)
where master_set_id = 2
),
sead_mal_references as (
select distinct biblio_id
from sample_groups
join tbl_sites s using (site_id)
join tbl_site_references sr using (site_id)
union
select distinct biblio_id
from sample_groups
join tbl_sample_group_references sgr using (sample_group_id)
)
select b.biblio_id, b.year, b.title, r.title, regexp_replace(lower(b.title), '\W+', '', 'g') as compressed
from tbl_biblio b
join sead_mal_references smr using (biblio_id)
left join temp_mal_report_register r
on regexp_replace(lower(r.title), '\W', '', 'g') = regexp_replace(lower(b.title), '\W', '', 'g')
where r.title is not null
order by b.year
Bibligraphy modern, Bibligraphy sites and Bibligraphy sites/samplegroups all contains rows with no text, nothing in the text column just values in the number column. If selected, the filter output shows a error message or displays the spinning search indicator indefinately. Should the rows contain text or are there faulty rows in db?