raymondlouie / MiniMarS

4 stars 2 forks source link

Errors with sc2marker and geneBasis #15

Closed HsiaoChiLiao closed 1 year ago

HsiaoChiLiao commented 1 year ago

I tested on the subsamples from the two datasets - "dataset4_pbmc_human_all_7865cells_14proteinCLRnorm.RDS" and "dataset2_malt_human_all_8412cells_14proteinCLRnorm.RDS"

Testing the function findClusterMarkers:

Ran citefuse and xgboost successfully (no errors or warnings).

Issues with sc2marker -error Error in curr_df$gene[[1]] : subscript out of bounds -warning In addition: Warning messages: 1: In CreateSeuratObject.default(input_matrix, meta.data = data.frame(cell_type = clusters)) : Some cells in meta.data not present in provided counts matrix 2: In mean.fxn(object[features, cells.1, drop = FALSE]) : NaNs produced 3: In mean.fxn(object[features, cells.1, drop = FALSE]) : NaNs produced 4: In mean.fxn(object[features, cells.1, drop = FALSE]) : NaNs produced 5: In mean.fxn(object[features, cells.1, drop = FALSE]) : NaNs produced 6: In mean.fxn(object[features, cells.1, drop = FALSE]) : NaNs produced

issues with geneBasis Error in value[[3L]](cond) : Can not perform modelGeneVar on this counts matrix - check your input data.

Arguments for subsampling and splitting data into training and testing set

processSubsampling(cluster_selection_out,
                                 clusters_sel="all_clusters",
                                 subsample_num=1000,
                                 train_test_ratio = 0.9,
                                 cluster_proportion= "proportional",
                                 verbose=TRUE)
raymondlouie commented 1 year ago

Thanks Hsiao Chi. Regarding the geneBasis error, I suspect it is because geneBasis is expecting positive values, possibly because it is expecting log(counts+1), which is >0, and the input is CLR which is negative.

For now, I've now modified the code to explicitly enforce positive values, by adding a pseudocount to each value = minimum value of the matrix. This is just for geneBasis. An alternative way to solve this is to also have a raw count input option, so we can perform log normalization, just for geneBasis.

HsiaoChiLiao commented 1 year ago

Thanks, Ray. I can prepare "raw ADT count matrix" for each dataset if that will help us test the package.

raymondlouie commented 1 year ago

Regarding the sc2marker issue, the reason was I believe because there were no cell names in the matrix. I've now manually created cell names, so the issue should be fixed. I've ran the datasets you had issues with, and I'm getting no errors.

HsiaoChiLiao commented 1 year ago

Just installed the newest version of the ClusterMarkers package with the command: devtools::install_github("https://github.com/raymondlouie/ClusterMarkers/tree/Dev")

For the same datasets, I am getting the same error message **Error in curr_df$gene[[1]] : subscript out of bounds** (the same as Dhruti's issue #18 ) from sc2marker.

I ran geneBasis without any errors. 🥳

raymondlouie commented 1 year ago

Thanks Hsiao-Chi, do you mind sending me the final_out variable (input to findClusterMarkers), so I can replicate the error? I'm having trouble replicating it on my end. Thanks!

HsiaoChiLiao commented 1 year ago

Just sent it to your email! 😊

26 Mar Re-installed the package & ran the pkg on a new RStudio window -> no error popped out