Error : The X portion of your clustering column names could not be converted to a number.

chitopia commented 3 years ago

Hello, first of all, thanks for making this useful package. I am analyzing scRNAseq data using SC3 package, so I drew clustree by SCE object. But the following error occurred.

> clustree(mm.sce, prefix = "sc3_", suffix = "_clusters")
Error: The X portion of your clustering column names could not be converted to a number. Please check that your prefix and suffix are correct: prefix = 'sc3_', suffix = '_clusters'

My metadata column names are as follows.

DataFrame with 6 rows and 37 columns

colnames(colData(mm.sce)) [1] "orig.ident" "nCount_RNA" "nFeature_RNA" "percent.mt" "ident"
[6] "Cell_Position" "Patient_Info" "nCount_SCT" "nFeature_SCT" "integrated_snn_res.0.5"
[11] "seurat_clusters" "CellAnnotation" "SingleR" "SimpleSingleR" "SCT_snn_res.0.2"
[16] "SCT_snn_res.0.3" "SCT_snn_res.0.4" "SCT_snn_res.0.5" "SCT_snn_res.0.6" "SCT_snn_res.0.7"
[21] "SCT_snn_res.0.8" "SCT_snn_res.0.9" "SCT_snn_res.1" "sc3_4_clusters" "sc3_5_clusters"
[26] "sc3_6_clusters" "sc3_7_clusters" "sc3_8_clusters" "sc3_9_clusters" "sc3_10_clusters"
[31] "sc3_4_log2_outlier_score" "sc3_5_log2_outlier_score" "sc3_6_log2_outlier_score" "sc3_7_log2_outlier_score" "sc3_8_log2_outlier_score" [36] "sc3_9_log2_outlier_score" "sc3_10_log2_outlier_score"

Can you let me know what circumstances this error occurs? (T_T)

lazappi commented 3 years ago

Hi @chitopia

Thanks for giving {clustree} a go! I think this error is coming from the less than ideal way which column names are matched. What happens is that columns are matched using prefix and then later on suffix is stripped from the matched column names. So in this case the other columns that start with "sc3_" (like "sc3_4_log2_outlier_score") are being matched and then it is failing to convert "4_log2_outlier_score" to a number.

What should happen is a match to the exact pattern "sc3_(.*)_clusters". I have been working on a rewrite of {clustree} that does this but it's still a way off being ready for users. It should be possible to make this change to the current release but might take a little while to get to it (unless you want to help with a PR?). In the meantime my only suggestion is to remove the other columns starting with "sc3_" (or extract colData, remove the columns and pass that to clustree(). I realise that's not ideal but I think it's the only option for now.

chitopia commented 3 years ago

Thanks to your response, I found the cause of the problem. 'seurat_clusters' column was matched after perceiving the 'prefix' in my meta data.

Thank you so much. Have a good day!

lazappi / clustree

Error : The X portion of your clustering column names could not be converted to a number. #72