greenelab / core-accessory-interactome

Investigating the functional relationship between P. aeruginosa core and accessory genes.
BSD 3-Clause "New" or "Revised" License
1 stars 1 forks source link

Compare relationships using genome distance vs expression distance #33

Closed ajlee21 closed 3 years ago

ajlee21 commented 3 years ago

Previously we attempted to label modules as "mostly core", "mostly accessory" or "mixed". We found that most modules were "mixed" and some were "mostly accessory". We noticed that there were many modules that had only core genes, yet were not found to be significantly "mostly core" based on our Fisher's exact test due to the small size of the modules as well as the large imbalance in the number of core:accessory genes.

These small modules, which are due to operons, is biologically sensible but hard for us to apply statistics. We want to try to tease apart the co-expression relationships that are due to locations (i.e. being in the same operon) versus other functional reasons.

Our strategy is the following:

image

image

The main takeaway is: Accessory genes are more likely to be highly co-expressed with other accessory genes, even accessory genes farther away (some coordination outside of location). This relationship is stronger in PA14 than PAO1 (i.e. accessory genes are more highly correlated with other accessory genes at farther distances in PA14). I wonder why this is.

nrosed commented 3 years ago

Looks good! I think i get the analysis, but I just was a bit confused on two things. What is the x-axis meaning in the plots above and how are nearest neighbors defined?

In thinking about PAO1 and PA14 differences, do they have different numbers of genes or different genome lengths?

ajlee21 commented 3 years ago

Looks good! I think i get the analysis, but I just was a bit confused on two things. What is the x-axis meaning in the plots above and how are nearest neighbors defined?

The x-axis indicates (left) is the number of NN, determined based on the gene id that is sorted. The x-axis on the right is the number of correlated genes (i.e. 1=top most correlated gene, 2 =2nd most correlated gene)

I'll add a comment and try to update the axis label to explain this

In thinking about PAO1 and PA14 differences, do they have different numbers of genes or different genome lengths?

The PA14 genome is much larger. I'll need to think a bit more about what might be causing the difference. Thank you for the questions!