select count(pv.*)
from public.het_variants hv
join public.variants pv
on (hv."CHROM", hv."POS", hv."REF", hv."ALT") = (pv."CHROM", pv."POS", pv."REF", pv."ALT")
join public.individuals ind on ind.internal_id = hv.individual
where hv.individual = 'PH00000166'
--and (pv.hgvsc, pv.hgvsp) != ('', '') -- filter
;
would give the number above and, if applying the commented filter above, as seen in scripts/migrate_variants.sh, this would give 1,252 rows.
Now using the new schema, the equivalent query (Q2):
select iv.zygosity,count(*) from phenopolis.variant v
join phenopolis.individual_variant iv on iv.variant_id = v.id
where iv.individual_id = 166
group by iv.zygosity
would give precisely the same 1,252 for HET.
However, when trying with PH00000168 something is very off. Using Q1 we got 6,432 (7,026 without filter) but when using Q2, it gives 68,770 !!!
Analysing script scripts/migrate_individual_variants.sh, the one that populates phenopolis.individual_variant:
select zig, type, count(*) from (
select ind.internal_id, ind.external_id, pv.variant_id, pv."CHROM", pv."POS", pv."REF", pv."ALT", 'HOM' as zig, 'internal_id' as "type"
from public.hom_variants hv
join public.variants pv
on (hv."CHROM", hv."POS", hv."REF", hv."ALT") = (pv."CHROM", pv."POS", pv."REF", pv."ALT")
join public.individuals ind on ind.internal_id = hv.individual
where (pv.hgvsc, pv.hgvsp) != ('', '')
union
select ind.internal_id, ind.external_id, pv.variant_id, pv."CHROM", pv."POS", pv."REF", pv."ALT", 'HOM' as zig, 'external_id' as "type"
from public.hom_variants hv
join public.variants pv
on (hv."CHROM", hv."POS", hv."REF", hv."ALT") = (pv."CHROM", pv."POS", pv."REF", pv."ALT")
join public.individuals ind on ind.external_id = hv.individual
where (pv.hgvsc, pv.hgvsp) != ('', '')
union
select ind.internal_id, ind.external_id, pv.variant_id, pv."CHROM", pv."POS", pv."REF", pv."ALT", 'HET' as zig, 'internal_id' as "type"
from public.het_variants hv
join public.variants pv
on (hv."CHROM", hv."POS", hv."REF", hv."ALT") = (pv."CHROM", pv."POS", pv."REF", pv."ALT")
join public.individuals ind on ind.internal_id = hv.individual
where (pv.hgvsc, pv.hgvsp) != ('', '')
union
select ind.internal_id, ind.external_id, pv.variant_id, pv."CHROM", pv."POS", pv."REF", pv."ALT", 'HET' as zig, 'external_id' as "type"
from public.het_variants hv
join public.variants pv
on (hv."CHROM", hv."POS", hv."REF", hv."ALT") = (pv."CHROM", pv."POS", pv."REF", pv."ALT")
join public.individuals ind on ind.external_id = hv.individual
where (pv.hgvsc, pv.hgvsp) != ('', '')
) as foo
where internal_id = 'PH00000168'
group by zig, type
;
zig
type
count
HET
internal_id
6,432
HOM
internal_id
330
I see it should have generated the correct rows for PH00000168 in phenopolis.individual_variant but it clearly not the case.
Now I'm trying to find out what went wrong. Audit tables (how?), logs (where?).
My understanding is that scripts/migrate_individual_variants.sh should have created:
zigosity
count
HET
5,151,780
HOM
302,287
and not
select iv.zygosity, count(*) from phenopolis.individual_variant iv
group by iv.zygosity;
zygosity
count
HET
7,372,330
HOM
1,559,814
BTW, contrary to the note seen in scripts/migrate_individual_variants.sh:
# Note: Some dbs have external_id in the variant's individual colummn, some
# have the internal db, so import both the "styles".
I haven't seen any case for public.individuals ind on ind.external_id = hv.individual.
I'm closing it. It seems to be fine, we just need to make sure the frontend start to make use of attributes limit and offset and when sending a api request to limit to, say, 1000 rows.
Take https://dev-live.phenopolis.org/individual/PH00000166, which has 1,565 rare variants.
The query in the old schema (Q1):
would give the number above and, if applying the commented filter above, as seen in
scripts/migrate_variants.sh
, this would give 1,252 rows.Now using the new schema, the equivalent query (Q2):
would give precisely the same 1,252 for HET.
However, when trying with
PH00000168
something is very off. Using Q1 we got 6,432 (7,026 without filter) but when using Q2, it gives 68,770 !!!Analysing script
scripts/migrate_individual_variants.sh
, the one that populatesphenopolis.individual_variant
:I see it should have generated the correct rows for
PH00000168
inphenopolis.individual_variant
but it clearly not the case.Now I'm trying to find out what went wrong. Audit tables (how?), logs (where?).
scripts/migrate_individual_variants.sh
should have created:and not
BTW, contrary to the note seen in
scripts/migrate_individual_variants.sh
:I haven't seen any case for
public.individuals ind on ind.external_id = hv.individual
.