yasin-uzun / MAPLE.1.0

An R package for predicting gene activity level for single cell DNA Methylation data
1 stars 2 forks source link

[compute_binned_met_counts] Error in rind(deparse. level, ...) : numbers of columns of arguments do not match #3

Open JiayiLi21 opened 1 month ago

JiayiLi21 commented 1 month ago

Hi Yasin,

While I was able to went through the workflow on a subset of 100 cells of our data (link to previous issue), a problem related to rbind came up when I run compute_binned_met_counts on all ~14k cells of our data. I was checking the source code and confused about why we got this error when running all the data.

p1: previous returned binned list on subset of our data Binned_list_subset100

p2: running process on data for all 14453 cells

processing

p3: Error in rind(deparse. level, ...) : numbers of columns of arguments do not match

rbind_error_allData
yasin-uzun commented 1 month ago

Hi Jiayi,

I think this happens due to extreme sparsity in some of the cells. MAPLE fails to work for the cells for which the data is very sparse (very few aligned reads). Filtering out the cells that has less than certain number of aligned reads may possibly solve the problem. This is a common practice in single-cell methylation data processing. You can set the threshold (minimum number of aligned reads) to 500K, 1M, 2M and see how many cells are filtered out/left and try running again.

Also, is your protocol genome-wide (such as scwgbs) or enriched for several genes only? MAPLE works for the data generated via genome-wide protocols.

JiayiLi21 commented 4 weeks ago

Hi Yasin,

Thank you! Our protocol is genome wide. I just checked the distribution of the aligned reads across cells, and set the threshold to 10k reads (around 6k cells used), will try more threshold settings. Just wondering did you set some hard threshold in your code? n_obs

yasin-uzun commented 4 weeks ago

Hi Jiayi,

I didn't place any threshold in MAPLE for cell filtering. I assumed that cells (i.e cov files) are already cleaned and filtered prior to running MAPLE.

I don't know the specifics of your experiment, but 10k limit might be somewhat low for filtering out low quality cells. In the snmC-Seq paper (PMID: 28798132), the authors used 400K non-clonal mapped reads as threshold.

"Data were cleaned by excluding low-quality cells using the following set of conservative criteria, ultimately yielding 3376 cells in mouse and 2784 cells in human for analysis. First, non-conversion rate was required to be low (≤1% in mouse and ≤2% in human). We set a minimum on the number of non-clonal mapped reads to eliminate contaminated samples (400K in mouse; 500K in human)."

By "samples", I assume they mean "cells". If that's possible, you may consider sequencing deeper.