neurogenomics / MAGMA_Celltyping

Find causal cell-types underlying complex trait genetics
https://neurogenomics.github.io/MAGMA_Celltyping
71 stars 31 forks source link

Add an error catch to prepare.quantile.groups #35

Closed NathanSkene closed 2 years ago

NathanSkene commented 3 years ago

Add error to catch instances where the specificity_quantiles are not proper quantiles (e.g. you asked for 40, did you get 40?). This can occur when the mean expression matrix is far too sparse. E.g. this is what the quantiles should look like:

image

This is what they shouldn't look like:

image

These plots are basically generated with:

hist(newCTD2[[1]]$specificity_quantiles[,"oligondendrocyte"])

bschilder commented 2 years ago

Providing a warning message with some relevant info for the user:

Screenshot 2021-11-20 at 00 29 26

☝️ This is using the Zeisel2015 CTD distributed via ewceData::ctd(). I thought perhaps it had something to do with the dropping of non-orthologs, but this shouldn't matter since mean expression is normalized first and then spec quantiles are recomputed.

Furthermore, I checked the CTD before dropping any genes or converting the genes to human. The issue is slightly less pronounced in level 2 (40 vs. 45 columns) but still pervasive.

Screenshot 2021-11-20 at 00 31 19
NathanSkene commented 2 years ago

Weird, I hadn’t noticed that before. This would mean that different cell types have differ levels of power, as there’s different numbers of genes in the top bins. Any idea why it happens?

On 20 Nov 2021, at 05:31, Brian M. Schilder @.**@.>> wrote:

This email from @.**@.> originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders listhttps://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.

Providing a warning message with some relevant info for the user:

[Screenshot 2021-11-20 at 00 29 26]https://user-images.githubusercontent.com/34280215/142715622-8a665f32-e36a-410d-add1-5f31ffb6fef3.png

☝️ This is using the Zeisel2015 CTD distributed via ewceData::ctd(). I thought perhaps it had something to do with the dropping of non-orthologs, but this shouldn't matter since mean expression is normalized first and then spec quantiles are recomputed.

Furthermore, I checked the CTD before dropping any genes or converting the genes to human. The issue is slightly less pronounced in level 2 (40 vs. 45 columns) but still pervasive.

[Screenshot 2021-11-20 at 00 31 19]https://user-images.githubusercontent.com/34280215/142715667-cd7d253b-0516-4afc-95b8-4bedf5964092.png

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/neurogenomics/MAGMA_Celltyping/issues/35#issuecomment-974598062, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AH5ZPE6Q6HKCXCQSVVKD5YDUM4XDRANCNFSM44HQJMPA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

bschilder commented 2 years ago

Fixed this so that prepare_quantile_groups makes sure the following matrices are in each CTD level (and if not, computes them). Both are computed using EWCE::bin_specificity_into_quantiles which now has a new argument that generate matrices with different names (matrix_name="specificity_quantiles"), to help distinguish the following:

check_quantiles then ensures that within each of these matrices, every column has the same number of quantiles.

I've deleted old functions that were attempting to replicate the EWCE functions, but introduced inconsistencies instead (e.g. the non-equal number of quantiles across columns). This ensures that MAGMA.Celltyping is using the exact same methodology to compute specificity [quantiles] as EWCE.