plantinformatics / pretzel

Javascript full-stack framework for Big Data visualisation and analysis
GNU General Public License v3.0
44 stars 13 forks source link

Display and selection of haplotypes #359

Closed Don-Isdale closed 5 months ago

Don-Isdale commented 1 year ago

Following on from #343


Summary / Overview




dataset intersection :

12f369de dataset intersection : omit -r when requestOptions.isecDatasetIds is given. 030dc6de default enableFeatureFilters to true; update when it changes. access isec status. 0cd9e673 dataset intersection : order multiple datasets by selected SNP b94d986d [/6H] sort sample columns of all datasets together 78e7b93b [/6H] dataset intersection : selected dataset and one other 479852b4 re-indent 1e510218 [/3H] gui input of number of intersecting datasets ecc9e95d [6H/6H] dataset intersection : update displayed data; gui input of dataset positionFilter : show input value and update


2023Nov23 :

81d772c3 [/3H] enable SNP / Feature filters without selecting Features tab c6116158 [/2H] Use the type of currently selected sampleFilters and update. 09efe007 [/2H] add call rate filter on SNPs

Don-Isdale commented 1 year ago

2023Sep07 - 2023Nov


Context and Discussion

Goals

Most work on haplotypes will be considering a fixed set of positions in a single dataset - we need some higher level concepts as we want to talk about haplotypes independently of dataset (or at least have a concept that is independent).

i.e. compare "realised sets" of several datasets to see linkage, which suggests the broader / theoretic concept of haplotype, which is not visible directly in the given datasets but can be used to infer connection.


This section defines these terms :

A suggested model

parent genome variant set (chr, pos, ref, alt, ) -> variant subset

For a given variant set and genotype dataset, the genotype values at the SNPs in the set are the realised haplotypes.


Representation in Database

The variant intervals can be represented as features in a dataset with the tag 'variantInterval'. Each feature has [start, end] which defines the variant interval.

variant set :


Operations

Define a variant set by either :

For a selected variant set :

For a given variant set and genotype dataset, to calculate the realised haplotypes : Search may utilise the variant interval from which the variant set was defined, as an initial filter which leverages the feature position index.

For a selected realised haplotype :

Comparison Operations

Those comparison operations represent the medium-term goal of this branch; as a possible intermediate step towards that :


GUI

Operations :

To align with the sample column headers in the Pretzel Genotype Table, this diagram would be rotated 90deg right.

This enables the user to select a realised haplotype or group of realised haplotypes - the table can be narrowed to the samples with the selected realised haplotypes. Sample columns which don't have that realised haplotype can be filtered out or sorted to the right in order of distance from the selected realised haplotype. Colours can be assigned to realised haplotypes and shown in the table as a Haplotype Heatmap

41598_2020_69381_Fig5_HTML

Variations on the clade analysis graph :

Similar :

Evolutionary dynamics of genome size and content during the adaptive radiation of Heliconiini butterflies Figure 2 Figure 6

Optional : show a heat-map of realised haplotypes proportions in the samples, as in : Heatmap-of-haplotypes This could go at the top of the table, between the sample names (column headings) and the SNP rows.

A few other views which may have some ideas to include :

The above graph visualisations seem to present the most accessible, self-explanatory and informative view for presenting realised haplotypes to the user. Keeping this out of the table will make it simpler to implement. The d3 functionality is fairly simple - they are just trees, the same tree with different layout.


Comparison of realised haplotypes

A visualisation may assist in seeing relationships between realised haplotypes of multiple genotypic datasets for the same variant set / variant interval. This might be similar to these alignments of species phylogeny trees : Screenshot from 2023-09-14 11-39-04 from Fig. 1

Similar :


Don-Isdale commented 11 months ago

2023Nov30 :

Expect Data Admin to install SNPList before uploading VCF worksheet:


this item requires some detail re. per-chr naming :


2023Dec13

cd7e7e03 require that base VCF and SNPLists are installed for VCF worksheets uploaded ccb69dce add histogram status column and show status in explorer.

2024Jan02

8e2d8ff1 combine MAF threshold exclude into include condition fcb2a27e omit MAF condition when mafThreshold is undefined 8c4b6d81 close dialog and update genotype table when request returns empty result b9044c3e accommodate height of dataset-vcf-status eac9b6c4 re-enable column-dragging by requiring Shift modifier for column selection 7c893a75 reduce overlap between column select and select for dragging columns 893fe23c disable opening genotype controls dialog before genotype datasets are viewed 289b7e41 clear dataset VCF status when dataset selection changes b97e8ff4 show rows for all dataset block scopes in dataset-vcf-status even if empty 339ab17b pass featureCallRateThreshold in request and use in cacheId ca66b845 select featuresCountsResults based on genotype user controls SNP Filters

2024Jan08

9735f2c3 Add minAlleles, maxAlleles, typeSNP to SNP filters 2c842d78 improve request and display of featuresCounts when zooming out 4c248702 ensure .MAF.vcf.gz has INFO/AC_Het 7a81d9b9 move dataset intersection isec. files into a sub-dir 504ebd02 handle featuresCounts undefined b4dac127 avoid exception when de-selecting after ctrl-click of sample column header fe0949b3 change genotype control checkbox from All to Selected Samples 7746c8ac add functions to show db indexes and create indexes used by pretzel c9bcf7b2 remove dataset intersection isec files after use 7e8c34d0 use .name instead of .scope in dnaSequenceLookup 91fdabe0 filter features by minAlleles or maxAlleles in frontend a6b6f5ce add a button to prepare histograms for the current SNP filters fe50dfdf disable caching of dynamic results 492b34c7 update histogram display when genotype SNP filters change, show filtered count

2024Jan19

33061dd9 Display result featureCount after preparing dataset histograms.

2024Jan19 - 30

eb077b72 pass genotypeSNPFilters to request for axis features b7959c53 use INFO/F_MISSING for SNP Call Rate comparison eb969a33 handle removed space in column heading in current bcftools version. fix passing SNP filter userOptions 0042ff8b Use INFO/F_MISSING for SNP Call Rate filter in frontend

9054085b generate .csi in support of adding F_MISSING

66557e3a Omit builtin rules and variables in make