Drug index Part II: Implement aggregation on drug tables

ElaineMcA commented 5 years ago

BE requirements:

Aggregate to protein level only across all columns.
Aggregate on target profile and disease profile pages only. N.B. NO aggregation on evidence page where target and disease parameters are already fixed.

FE requirements:

URLs in table: (add badge/# unique entries) and pop-up with links.
Table downloads to remain as is with all records presented.
Additional API endpoints required to feed graphical representations to be confirmed @afaulconbridge @LucaFumis

andrewhercules commented 5 years ago

From an FE perspective, I can see two possible issues with the points above and another potential issue for the summary charts - but if these have already been raised and addressed, please let me know.

2. Table downloads to remain as is with all records presented

Currently, we use the same API response to generate the UI table and the download table. If the data table download is to remain as-is (with all records) while the UI is to be aggregated rows, we will need to store and process both objects in the front-end Angular/React application. This will require changes to the logic in the existing Angular app along with changes in the new React app.

3. Additional API endpoints required to feed graphical representations to be confirmed

Currently, the same drugs table Angular directive and HTML component are used for the target profile, disease profile, and evidence pages. If we introduce new endpoints to allow for aggregated evidence on the target and disease profile pages but non-aggregated evidence on the evidence page, this will require changes to the directive.

Another issue to raise is that in order for the summary charts to work (e.g. the donut showing clinical trial records), we will need access to all of the data to calculate the numbers or the API needs to return these values. However, if we end up implementing point 2 - 'Table downloads to remain as is with all records presented' - then we will have access to both objects and can still process the data to display the charts. But because there are diseases with a large number of records (e.g. type II diabetes mellitus), we may encounter performance issues if we have to load both objects - an aggregated one for the UI and a non-aggregated one for the data table download - into the app. I would suspect this will be more of an issue for the Angular app and less of a problem for the React/GraphQL app, but the FE team can confirm if this is the case.

I flag these issues because they will require changes to the existing Angular app and the React app. The FE team has to assess the relative priority of making changes and maintaining the Angular app versus building the React app.

Perhaps next week, @d0choa and @LucaFumis we can meet to discuss the scope of this work to determine what we can support for the 19.09 release and what changes we can make to the React app to remain on-schedule for completion?

andrewhercules commented 5 years ago

@afaulconbridge, as promised in yesterday's meeting, here is the data that we use to generate the summary charts at the top of the drugs data tables.

Currently, we process the entire API response to generate the data. If the API can return this data, that would be great. However, if it is not possible, we can make adjustments on the front-end by processing the response and expanding each row depending on the number of clinical trial/ATC/FDA records.

EGFR - ENSG00000146648

{
    "data": {
        "unique_drugs": 41,
        "associated_diseases": 140,
        "associated_targets": 1,
        "clinical_trials": {
            "phase_4": 59,
            "phase_3": 304,
            "phase_2": 1438,
            "phase_1": 816,
            "phase_0": 9,
        },
        "drug_type_activity": {
            "small_molecule": {
                "drug_negative_modulator": 29,
            },
            "antibody": {
                "drug_negative_modulator": 10,
                "up_or_down": 2
            }
        }
    }
}

Kidney disease - EFO_0003086

{
    "data": {
        "unique_drugs": 346,
        "associated_diseases": 45,
        "associated_targets": 429,
        "clinical_trials": {
            "phase_4": 1419,
            "phase_3": 1182,
            "phase_2": 3524,
            "phase_1": 1132,
            "phase_0": 47,
        },
        "drug_type_activity": {
            "small_molecule": {
                "drug_positive_modulator": 55,
                "drug_negative_modulator": 190,
                "up_or_down": 6
            },
            "protein": {
                "drug_positive_modulator": 22,
                "drug_negative_modulator": 9,
                "up_or_down": 3
            },
            "oligosaccharide": {
                "up_or_down": 4
            },
            "antibody": {
                "drug_positive_modulator": 1,
                "drug_negative_modulator": 44,
                "up_or_down": 7
            },
            "enzyme": {
                "drug_positive_modulator": 1,
                "up_or_down": 1
            },
            "unknown": {
                "drug_negative_modulator": 3,
            }
        }
    }
}

afaulconbridge commented 5 years ago

In some cases, the aggregated table may have more than 10,000 rows (e.g. neoplasm EFO_0000616). For those, we will keep the existing warning box and order by (numeric not roman) phase, status, drug, disease, target.

We can use server-side "facet" to provide summary stats over all results, not just the 10k returned.

This is not ideal, but its not worse than the current situation. If users want the full set, the download will use the non-aggregated endpoint and paging to get the full set for any size.

afaulconbridge commented 5 years ago

Note, the aggregated table will need to be capped to a lower number so there are some "buckets" avaliable for the summary stats. 9,000 should be sufficient.

afaulconbridge commented 5 years ago

Example of a prototype response for /v3/platform/public/evidence/known_drug?target=ENSG00000146648 https://gist.github.com/afaulconbridge/0d16697dd1cbe2bfadf4ec0fd8d6bb36

andrewhercules commented 5 years ago

Looks great @afaulconbridge! 👍

So that we can recreate the existing data table and functionality (e.g. link to drug profile page), can we please add the following fields to the response?

[ ] Drug ID
[ ] Drug type
[ ] Mechanism of Action
[ ] Target symbol
[ ] Target class
[ ] Numeric trial phase (used to sort table by phase)

Current drugs data table for EGFR Screenshot 2019-08-01 at 14 26 32

afaulconbridge commented 5 years ago

Updated example for EGFR with extra information: https://gist.github.com/afaulconbridge/6c80f4833d5eef47dadd490ab7930505

Note, target_class and mechanisms_of_action are multi-valued fields because that's whats in the data. Sometimes its a bit daft e.g. https://gist.github.com/afaulconbridge/6c80f4833d5eef47dadd490ab7930505#file-response-json-L115-L116

andrewhercules commented 5 years ago

Looks great - all of the data we need to generate the data table is available! Thank you @afaulconbridge! :)

andrewhercules commented 5 years ago

@afaulconbridge, would you be able to upload the samples you generated for EGFR, cancer, and diabetes to the team Google Drive or as another gist? I'm going to build a new spec for the data table and want to make sure I label it correctly. :-)

andrewhercules commented 5 years ago

After a discussion with @LucaFumis and @d0choa about aggregated drug evidence, I have created three basic mockups showing different ways to display the data in the current Angular app.

Option 1 - Expanded Rows

This would follow the design pattern we have established in the rewrite for the similar targets and similar diseases data. It would use the DataTables child rows feature.

aggregation_v1

Option 2 - Popup

This would follow the design pattern we use in the Prioritisation view. For this option, we will have to pay particular attention to the mouse-over interaction styles - perhaps persisting the popup until a user has clicked the close button?

aggregation_v2

Option 3 - Modal

This option is similar to the on-hover and on-click popup option above - the difference being that we display a modal with some contextual information.

aggregation_v3

I would recommend Option 1 - Expanded Rows as it closely aligns with what we are doing in the React rewrite and there may be an opportunity to reuse code. However, we need to investigate whether DataTable's child rows feature is suitable and will work with the current technical setup. We also need to explore which option will scale best as the number of aggregated records can vary from 1 to 400+. For example, PTGS2 has 471 Phase IV N/A records for pain and acetaminophen.

@LucaFumis will investigate further from a technical standpoint and we will finalise a decision and design by 30 August 2019.

LucaFumis commented 5 years ago

Just a thought. The response will be capped at 10k I imagine, same as current endpoint? Currently the API (filter evidence endpoint) returns size and total:

size is the same as data.length so is somewhat redundant
but total is handy: if total>size we only have partial results (but we can just hardcode 10k)

LucaFumis commented 5 years ago

Update on front end progress. I've implemented a working prototype addressing the main points above to help decisions and design for the end of the month:

summary charts
table content
expandable rows (clicking on the 'source' column, e.g. '2 records +')
column filters

Notes:

data is static from a mock endpoint pulling the json generated by Adam for EGFR. So you will see the same table on each target profile page. There is a new API / endpoint but at the moment is not working well with the app due to certificates (will need backend help when I'm back).
'up_or_down' count in the charts has been changed for display test purposes
downloads don't work (these haven't been addressed yet, so they work partially)
fine details haven't been addressed (e.g. styling of buttons/links)

Demo: https://lf-drugagg-682--paratrooper-antelope-70721.netlify.com/target/ENSG00000146648

Thoughts:

expanded rows: it might be tricky to display the details as per design mock, depending on the data (currently it seems we only have clinicaltrials.gov urls)
need what is currently in evidence.target2drug.urls[2] to complete mechanism of action column as per design mock
need access_level field in response

I think that's all for now.

andrewhercules commented 5 years ago

I have taken a look at the full API response and @cmalangone and @afaulconbridge, there is data missing from the QC API endpoint that serves aggregated drug evidence (example API response).

It appears that any trials/records with a phase of Phase IV and a status of N/A are not being returned by the API. These evidence strings are most often from DailyMed, FDA, and ATC.

For example, in the QC API response for ESR1, there is not an entry for acne - Phase IV N/A - ETHINYL ESTRADIOL whereas I would expect an entry with a count of 4 - see below for a screenshot from the current Platform.

Screenshot 2019-09-05 at 10 14 20

Can you please investigate a possible fix for the 19.09 release?

Also, in the current production API response, we use the data in evidence.target2drug.urls[2] to create links in the Mechanism of action column. Can we please reinstate this links and add them to the response for the aggregated data endpoint?

LucaFumis commented 5 years ago

I tried my local instance with the platform-dev API: for "cancer" (EFO_0000311) it worked (a bit over 1.5 minutes). I notice that the call returned all data (67.5K rows) even though the limit in the call was set to 10K.

afaulconbridge commented 4 years ago

@d0choa @andrewhercules as it was decided not to implement this, can this ticket be closed?

d0choa commented 4 years ago

It's not high priority anymore, but the underlying problem is still there. I'm updating priorities and assignees accordingly.

andrewhercules commented 4 years ago

Ticket reassigned to 20.02 release

d0choa commented 4 years ago

This is a long-standing issue. Certain aggregations in the drug tables introduce certain challenges on the way they are computed in our back-end. We attempted to do aggregations in ES but the performance was poor.

Now, we will revisit this problem from a different angle. Dedicated tickets will be opened

opentargets / issues

Drug index Part II: Implement aggregation on drug tables #682