Restructure html report

cassimons commented 1 year ago

Proposed Changes

This PR is an overhaul of the AIP html report. An example of the new format can be seen here: https://test-web.populationgenomics.org.au/acute-care/reanalysis/no_private/draft-restructured-report.html

Changes include:

Moving to a tabbed interface allowing the summary stats and run metadata to be moved to their own space
Changing to a single large table containing all variants for all samples. This has some tradeoffs but on balance I think is more usible than one splitting the table per sample.
Simplified data presented in the main result fields of the table.
Added expanding child-row (draw) that includes additional variant and transcript details.
Swapped to the tablesorter js library. Compared to datatables, tablesorter has better support for child rows and better out-of-the-box type-aware advanced filtering.

Things for discussion:

I made many judgement calls on what detail to show where. Overall, I think this is a better balance but we will need to discuss/test with users and revise as needed.
Now we can display additional detail, should we pass more info into the AIP results json? For example, it would be great to have individual genotype data available. Displaying patient HPO terms would also be super useful.
Where do the "support" only variants go that are supporting a CH? As far as I can see the test json I have been using does not actually contain them as "variants"?
I have included single-line descriptions of each category. These probably need wordsmithing. Might also be nice if we can link to full technical description of each category in the docs?
It would be great to include the date variant 1st reported in the top-level variant info if it is available. This way we can highlight and/or dynamically filter to just recent variants.

MattWellie commented 1 year ago

I love the tabbed interface, it makes the whole view much cleaner (not having to scroll past a bunch of waffle to get to actual variants...)

The one thing I have concerns about it the single-table interface is knowing when all variants/samples have been reviewed. that might be as simple as making the Individual column filter box a drop-down list of available values in that row instead of free text, so an analyst can track through each participant one at a time (if they want).

Adding a top-line annotation on the variants to capture the number of times this variant appears in this cohort could be a useful prompt for users to drop back to a per-variant search, if they want to look for common features/HPO between multiple families with the same variant.

From the discussion points:

Nothing to add about user feedback, interested in seeing how they feel
HPO terms are a simple add, there's some redundancy in adding them per-variant instead of per-individual, but we aren't expecting large variant numbers per family so probably not worth getting hung up over that.

2a. Genotypes are an interesting one. With this more fully-featured display that's probably the main thing completely missing. Given the arbitrary family sizes, it probably makes sense to do that either though a list of inferred relationships rather than sample ID (Proband: 1/1, Father: 0/1), or by embedding a mini-image of the pedigree structure like Seqr has.

The variants in the table are only the ones which qualify as a primary variant. 17:1736124 is an example from that test report - it doesn't appear as its own row because it only qualified as supporting, so the MOI logic skips evaluating it as a primary variant. When two eligible variants (e.g. a Cat 2 and a Cat 4) are in a compound-het, both would appear as a separate row, with reciprocal links to the other as a support.
I can't see where the single-line category description is? The revised README.md will have link-able sections to the individual categories and reasoning for each one. I saw that the inheritance patterns have been added much more concisely under the symbol which is a much better use of space.
This logic doesn't exist yet - I think we discussed it a while back but I've only just added it to the backlog board. The current logic just filters out against prior results, instead of tagging with first date seen - that should be a simple mod, and I've added it to the backlog board.

MattWellie commented 1 year ago

Couple of touch-ups:

There's a few typos dotted around The support variants don't render as hyperlinks The family/individual links don't work (that's on me, the default config file contains the wrong project ID, so the links are invalid. R0039_acute_care -> R0011_acute_care in the config should fix it) MutationTaster appears twice in the in silico list, probably something I messed up in the result JSON

cassimons commented 1 year ago

The one thing I have concerns about it the single-table interface is knowing when all variants/samples have been reviewed. that might be as simple as making the Individual column filter box a drop-down list of available values in that row instead of free text, so an analyst can track through each participant one at a time (if they want).

Yea, I went back an forth on this. When I have done similar things in the past I have always found the drop-down selectors to be a pain once you have more than a handful of things to select from. I get frustrated hunting for the right one and the string-based filter is much faster. I hear what you mean about not knowing when you are done. In reality I am not sure how we solve this correctly without moving to a more full-featured web app where we can track review state (which I do not think we want to do at this point). I can try adding the drop down and we can try it out.

HPO terms can just be attached to the samples in the results.json. This would be all HPO for that sample and is useful for the analysis to glance at to see what the patient had without the need to look it up. If we wanted to get fancy, we could also annotate the pannel matches with the HPO term that triggered each specific panel match (ie which of the patient HPO terms were the ones that triggered a panel to be flagged). This would require you to do some lifting, but I think we probably need to separate the panels from the flags in the report in any case.

2a. genotypes. My thought would be to keep this simple, remember it is intended to be short cut to clicking through to seqer et al, not a replacement. My thought would be just to present similar to what you see in a seqr variant row: each individual has and ID, sex, affected status and the genotype (so no pedigree/relationship etc). This conveys most of the info without us needing to us to parse the relationships which gets ugly.

Ok I see the logic. I will add a hyperlink to these.
You can find them all here. They appear in the right-hand side of each child row.

There's a few typos dotted around

Yea that is a cost of letting me have commit access to anything... Can you just fix these as you see them please (or send me a specific list). I will never see them 🤦‍♂️

The support variants don't render as hyperlinks

I will fix this

cassimons commented 1 year ago

I added in grouping the table by individual. I think this is ready to go?

populationgenomics / automated-interpretation-pipeline

Restructure html report #181

Proposed Changes