legumeinfo / tripal_phylotree

LIS project- tripal module for chado phylogeny and gene families
GNU General Public License v2.0
1 stars 7 forks source link

Removing Hard-coding of organisms #32

Open laceysanderson opened 6 years ago

laceysanderson commented 6 years ago

Starting an issue for discussion on removing hard-coding of organism from Tripal Phylotree :-)

Quite a lot of work on this has already been done by @abretaud in #28 which changes the materialized view and drupal view to be more dynamic. This PR stores the counts for all organisms as a JSON array rather then having a column per organism in phylotree_count mview. Furthermore, it created a custom field handler to display the counts on the default phylotree view.

@adf-ncgr had asked for my help regarding count filters specific to each organism and more control over the columns added during the last Tripal Help Desk.

More control over the count columns added to the default view

Currently in the PR the columns are being added via hook_views_pre_view(). This is to ensure that the machine name of the field includes the organism_id, which is then used in the render of the views field handler. Instead, I would suggest defining the organism in the options_definition() and options_form(). This would then allow the field to be added through the Views UI. Furthermore, you could allow multiple organisms to be selected for a given field using this method which would provide the generic way to group organisms :-) I would also suggest adding the fields to the default view programmatically rather then using hook_views_pre_view() since this would allow the admin to override which columns are shown while still providing a better default then no fields.

Filter by count per organism

I can see why @abretaud is running into difficulties with the filtering. The current structure of the JSON would make it difficult: [{1:5}, {2:5}, {3:5}] where 1,2,3 are organism_ids and each organism has a count of 5. Filtering would be much easier if this structure were {1:5, 2:5, 3:5} because then you could use counts_by_org ->> '1' > '3' to grab the records where organism_id=1 has a count > 3. However, without a database to play with I don't have specific suggestions on how to change the query to populate the mview in this manner.

Additional Note: currently the field containing the counts is of type text. You can make it of type JSONB which would make the fields much faster using a combination of pgsql_type and views_type.

abretaud commented 6 years ago

Cool suggestions! I don't know if I'll have type to work on this very soon, but that's precious help

FYI I've made some other small modifications to this module to be able to use it on our data in production. I'm using the bipaa branch from my fork: https://github.com/abretaud/tripal_phylotree/tree/bipaa

It includes:

I think you can view a huge diff there: https://github.com/legumeinfo/tripal_phylotree/compare/lis_master...abretaud:bipaa?expand=1