Add clustering analysis of accessory genes for PAO1 and PA14 compendia

ajlee21 commented 3 years ago

This PR performs a quick exploratory analysis to look at the clustering of accessory genes in our newly created PAO1 and PA14 compendia (i.e. these are gene expression matrices where one is aligned to a PAO1 reference and the other is aligned to the PA14 reference).

Takeaway:

This is a positive control that PAO1 annotated samples have higher median expression of PAO1-only genes compared to PA14-only genes. And similarly, PA14 annotated samples have higher median expression of PA14-only genes compared to PAO1-only genes. In other words, we expect that PA14-only genes will have either 0 or very low values in PAO1 samples and vice versa.
This result also shows that we can anticipate a very clear binning of our samples into PAO1 and PA14 if we use mapping rates.

nrosed commented 3 years ago

Looks good to me! Only slight comment, I can't really see the density of points in the plot. If zooming in closer to (x=0,y=0), what proportion of points are there? Also, it looks like for many strain specific genes the expression is exactly 0, is this true or is it in some region around zero?

ajlee21 commented 3 years ago

Great questions!

I definitely plan to do some more digging into the data after my committee meeting next week:) and I'll add this to my list of notes

I will say that I would expect these strain specific genes to be very lowly expressed in not their respective strain (i.e. PAO1-only genes should be 0 or lowly expressed in PA14 strains) which is why you see the build up on the axis.

greenelab / core-accessory-interactome

Add clustering analysis of accessory genes for PAO1 and PA14 compendia #16