This PR adds an experiment to determine which correlation matrix to use: counts vs SPELL-processed counts version.
This PR contains the following major changes:
Add spell_vs_counts_experiment/README.md describes the experiment
Add spell_vs_counts_experiment/1a_compare_SPELL_vs_counts_correlation.ipynb performs the experiment
Pull out previous analysis comparing most stable core genes using SPELL vs counts into its own notebook 3_core_core_analysis/compare_most_stable_genes.ipynb
This PR also found a big in my code. Previously we were generating a correlation matrix using all the genes, but now we were making two different correlations - one using only core genes, another using only accessory genes. When I added this new feature I made it so that the correlation was created first and then I used pandas to split the correlation matrix into the different gene subsets. However this meant that the correlation scores in both the core-core and acc-acc matrices are based on all genes not just core or accessory. This means that any relationships between core-core or acc-acc using the correlation matrix were based on SVs that represent linear combinations of all genes. Since we're examining the gene groups separately and then together I think it makes sense to split the genes before the correlation is calculated. This change is added to the 1_correlation_analysis.ipynb
The remaining files changed are data files created and others that were re-organized
This PR adds an experiment to determine which correlation matrix to use: counts vs SPELL-processed counts version.
This PR contains the following major changes:
spell_vs_counts_experiment/README.md
describes the experimentspell_vs_counts_experiment/1a_compare_SPELL_vs_counts_correlation.ipynb
performs the experiment3_core_core_analysis/compare_most_stable_genes.ipynb
1_correlation_analysis.ipynb
The remaining files changed are data files created and others that were re-organized