greenelab / core-accessory-interactome

Investigating the functional relationship between P. aeruginosa core and accessory genes.
BSD 3-Clause "New" or "Revised" License
1 stars 1 forks source link

Compare SPELL vs counts correlation #48

Closed ajlee21 closed 2 years ago

ajlee21 commented 2 years ago

This PR adds an experiment to determine which correlation matrix to use: counts vs SPELL-processed counts version.

This PR contains the following major changes:

  1. Add spell_vs_counts_experiment/README.md describes the experiment
  2. Add spell_vs_counts_experiment/1a_compare_SPELL_vs_counts_correlation.ipynb performs the experiment
  3. Pull out previous analysis comparing most stable core genes using SPELL vs counts into its own notebook 3_core_core_analysis/compare_most_stable_genes.ipynb
  4. This PR also found a big in my code. Previously we were generating a correlation matrix using all the genes, but now we were making two different correlations - one using only core genes, another using only accessory genes. When I added this new feature I made it so that the correlation was created first and then I used pandas to split the correlation matrix into the different gene subsets. However this meant that the correlation scores in both the core-core and acc-acc matrices are based on all genes not just core or accessory. This means that any relationships between core-core or acc-acc using the correlation matrix were based on SVs that represent linear combinations of all genes. Since we're examining the gene groups separately and then together I think it makes sense to split the genes before the correlation is calculated. This change is added to the 1_correlation_analysis.ipynb

The remaining files changed are data files created and others that were re-organized