Add an error catch to prepare.quantile.groups

NathanSkene commented 3 years ago

Add error to catch instances where the specificity_quantiles are not proper quantiles (e.g. you asked for 40, did you get 40?). This can occur when the mean expression matrix is far too sparse. E.g. this is what the quantiles should look like:

This is what they shouldn't look like:

These plots are basically generated with:

hist(newCTD2[[1]]$specificity_quantiles[,"oligondendrocyte"])

bschilder commented 2 years ago

Providing a warning message with some relevant info for the user:

☝️ This is using the Zeisel2015 CTD distributed via ewceData::ctd(). I thought perhaps it had something to do with the dropping of non-orthologs, but this shouldn't matter since mean expression is normalized first and then spec quantiles are recomputed.

Furthermore, I checked the CTD before dropping any genes or converting the genes to human. The issue is slightly less pronounced in level 2 (40 vs. 45 columns) but still pervasive.

NathanSkene commented 2 years ago

Weird, I hadn’t noticed that before. This would mean that different cell types have differ levels of power, as there’s different numbers of genes in the top bins. Any idea why it happens?

On 20 Nov 2021, at 05:31, Brian M. Schilder @.**@.>> wrote:

This email from @.**@.> originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders listhttps://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.

Providing a warning message with some relevant info for the user:

[Screenshot 2021-11-20 at 00 29 26]https://user-images.githubusercontent.com/34280215/142715622-8a665f32-e36a-410d-add1-5f31ffb6fef3.png

☝️ This is using the Zeisel2015 CTD distributed via ewceData::ctd(). I thought perhaps it had something to do with the dropping of non-orthologs, but this shouldn't matter since mean expression is normalized first and then spec quantiles are recomputed.

Furthermore, I checked the CTD before dropping any genes or converting the genes to human. The issue is slightly less pronounced in level 2 (40 vs. 45 columns) but still pervasive.

[Screenshot 2021-11-20 at 00 31 19]https://user-images.githubusercontent.com/34280215/142715667-cd7d253b-0516-4afc-95b8-4bedf5964092.png

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/neurogenomics/MAGMA_Celltyping/issues/35#issuecomment-974598062, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AH5ZPE6Q6HKCXCQSVVKD5YDUM4XDRANCNFSM44HQJMPA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

bschilder commented 2 years ago

Fixed this so that prepare_quantile_groups makes sure the following matrices are in each CTD level (and if not, computes them). Both are computed using EWCE::bin_specificity_into_quantiles which now has a new argument that generate matrices with different names (matrix_name="specificity_quantiles"), to help distinguish the following:

"specificity_quantiles": By default quantizes each column into 40 bins (unless otherwise specified by the user).
"specificity_deciles": Quantizes each column into 10 bins.

check_quantiles then ensures that within each of these matrices, every column has the same number of quantiles.

I've deleted old functions that were attempting to replicate the EWCE functions, but introduced inconsistencies instead (e.g. the non-equal number of quantiles across columns). This ensures that MAGMA.Celltyping is using the exact same methodology to compute specificity [quantiles] as EWCE.

normalise_mean_exp
bin_specificityDistance_into_quantiles
bin_expression_into_quantiles

neurogenomics / MAGMA_Celltyping

Add an error catch to prepare.quantile.groups #35