Open nbokulich opened 7 years ago
Below is an implementation of the scatterplot. I am not sure which file should I save the function. So I keep the code here :)
One thing I am not sure is how QIIME2 export figures. @nbokulich could you help me on that? Thanks !
def design_plog(metadata: qiime2.Metadata,
individual_id_column: str,
individual_time_column: str,
individual_group_column: str,
fig_width: int,
fig_height: int):
# load and prep metadata
metadata = _load_metadata(metadata)
_validate_metadata_is_superset(metadata, table)
metadata = metadata[metadata.index.isin(table.index)]
# validate id column (#How could I ensure, time column is a int/numeric?)
_validate_input_columns(metadata, individual_id_column, None, None, None)
_validate_input_columns(metadata, individual_time_column, None, None, None)
_validate_input_columns(metadata, individual_group_column, None, None, None)
_design_plot(sample_md, individual_id_column, individual_time_column,
individual_group_column, fig_width, fig_height)
def _design_plot(sample_md,
individual_id_column,
individual_time_column,
individual_group_column,
fig_width,
fig_height):
'''Function to create study design plot.
sample_md: pd.DataFrame
Sample metadata
individual_id_column: str
Metadata column containing IDs for individual subjects
individual_time_column: str
Metadata column containing sample collection time for individual subjects
individual_group_column: str
Metadata column containing group indicator of individual subjects
fig_width: int
Figure Width
fig_height: int
Figure Height
'''
sample_md = sample_md.rename(columns={individual_id_column: 'id',
individual_time_column: 'time',
individual_group_column: 'group'})
sample_md["id_loc"] = sample_md["id"].astype('category').cat.codes
# Keep for potential operation of the label
sample_md["id_label"] = sample_md["id"]
u_group = sample_md["group"].unique()
n_group = len(u_group)
sample_md_meta = sample_md[["id", "id_loc", "id_label"]]
sample_md_meta = sample_md_meta.drop_duplicates().reset_index(drop=True)
plt.figure(figsize=(fig_width, fig_height))
for grp in u_group:
_md = sample_md[sample_md.group == grp]
plt.scatter(_md.time, _md.id_loc, label = grp)
plt.xlabel(individual_time_column)
plt.yticks(sample_md_meta["id_loc"], sample_md_meta["id_label"])
plt.ylabel(individual_id_column)
plt.legend(loc=9, bbox_to_anchor = (0.5, -0.1), ncol = n_group)
# Test
from matplotlib import pyplot as plt
import pandas as pd
sample_md_fp = "ecam_map_maturity.txt"
sample_md = pd.DataFrame.from_csv(sample_md_fp, sep='\t')
_design_plot(sample_md, "studyid", "month", "diet_3", 6, 8)
plt.show()
thanks @elong0527 ! I think for now the best thing to do is add these functions to my fork of q2-metadata
, that way we can work together on this (e.g., I can review what you have put together and add a visualization template that displays the plots) before making a pull request into the main repository. @jairideout does this sound like a good plan?
@elong0527 could you please add these functions into a new file named _explore.py
in this directory and make a pull request into my branch? Do not add the test that you wrote — we will work on tests later after we figure out which test data, etc, we will use.
Also now that this action is in q2-metadata
instead of q2-longitudinal
we will probably want to make it usable on categorical data as well as numerical data — see the notes that I made in the first post in this thread, and we should test whether these scatter plots can still be made with categorical data on the x-axis.
If you have any questions on how to make a pull request into my fork, etc, please just email me directly.
@jairideout does this sound like a good plan?
Sounds perfect!
Proposed Behavior Example and idea provided by @elong0527 and issue moved from
q2-longitudinal
:X-axis = time (or other continuous metadata column) (possibly also support categorical columns?)
y-axis = subject ID (e.g., to support plotting individuals that are plotted repeatedly over time). This was originally planned for
q2-longitudinal
but should be generalized for non-longitudinal sampling designs — perhaps y-axis should be an optional parameter (if True, plot as scatter plot; if false, plot barplot?)points colored by
group
category (should accept categorical or continuous metadata, infer type, and color-code accordingly)Questions Could also add a parameter to change size or shape of points based on other optional metadata category inputs???