motleystate / moonstone

Library to perform Metagenomics data analysis with Python
https://moonstone.readthedocs.io/en/latest/?badge=latest
MIT License
1 stars 0 forks source link

PCoA Plotly Plotting from combined counts-metadata DF #79

Open skennedy8 opened 3 years ago

skennedy8 commented 3 years ago

Description

A combined DF including both normalized counts and metadata is a convenient means of data analysis. The DF is often validated for samples having both types of data and facilitates the sorting of samples.

The issue involves modifying/extending existing PCoA plotting in moonstone to handle this format.

Additional information

To preserve existing code promote stability, an attempt will be made to use metadata columns to filter any combined DF, where only counts are required in the pipeline.

skennedy8 commented 3 years ago

The goal here has been changed to reflect experience while performing the analysis. There is really not an easy means of combining COUNT and METADATA into a single DataFrame; there needs to be validation of sample with both counts and clinical data. Integrating clinical data also means dealing with missing values, data types and selecting variables of interest. This seems to be best accomplished on a case-by-case basis in individual notebooks using the available codebase for generating the distance matrices and performing visualizations.

A useful addition would be a DataFrame ' cleaner/validator' for the metadata.fr used in the visualize_pcoa function. A second objective is to add a function to perform PERMANOVA using the distance matrix and metadata_df. Is this the right place for it?