obiba / opal

OBiBa’s core database application for biobanks or epidemiological studies.
http://www.obiba.org/pages/products/opal/
GNU General Public License v3.0
29 stars 22 forks source link

OPAL-2966: entity-reference entity lookup and joins #2141

Open ymarcon opened 6 years ago

ymarcon commented 6 years ago

Jira issue originally created by user abhishek:

Hi,

Please consider this scenario,

tableA - demographics (entityType- partient) tableB - timepoints (entityType - timepoint)

Each row in tableB is uniquely identified by the actual patientid and timepoint combination (e.g. pat001_1, pat001_2, pat0021).

Would it be possible to add some features to

1) show the count of reference entities in summary statistics of variables from the timepoint table (e.g. 900 timepoints with Systolic for 50 participants)?

2) create some view where the user does not have to explicitly write the join magma whenever they want to pull variables from tables (e.g. demographics) that contain data on reference entities?

3) navigate to tables that contain data on reference entities from opal_id column in the timepoint table (Values tab)?

Thank you

ymarcon commented 6 years ago

Comment created by @ymarcon:

Yes, I agree with all your propositions: it is more convenient to have not repeatable variables (easier to filter) but the counterpart is that opal should facilitate the merge of tables of different types referring each other + give some appropriate summary statistics. I think also at the search entities page could give the number of biomarkers for instance satisfying some criteria AND the count of participants to which these biomarkers apply.

Note: in elasticsearch this aggregation is called cardinality which is equivalent to a "SELECT COUNT(DISTINCT ...)" SQL statement (but with an approximate result).

ymarcon commented 6 years ago

Comment created by abhishek:

Thanks Yannick.

In this case can magma (Entity Filter) be made to return reference entities? Another thing to consider is whether the user can filter some entities by applying condition(s) on reference entities (e.g. get data for all time-points from all males. Here tableB with entity type timepoint refers to tableA with entity type participant). The idea is that a join of entity and reference entity tables should produce a view with entities from tableB but columns from both table A and B.

To implement this functionality Opal should know in advance what column in tableB is the foreign key (i.e. contains reference entities). One way to do this is to have a reserved column (e.g. 'RefID') in CSV to be imported. The summary statistics of variables can also contain count or breakdown by each referenced entity.

This will be a cool thing to implement however allowing the filtering of repeated measures at row level (search UI) is still recommended. I personally think it will be a massive boost to Opal's search UI.

Thanks again.