Closed shunyasanuma closed 1 month ago
Making a general decision is difficult because of dataset complexity. Some thoughts below that apply to pwms+graph
modeling in single-cell. Happy to expand for your potential use case.
n_sample_cells
increasingly in four single-cell datasets against scBasset. The current preprint shows results for a fixed number of cells and peaks (5,000, 15,000). Additional values as TSV will be uploaded by mid-Sep at https://github.com/theislab/mubind-benchmarkn_sample_peaks
, from best practices we refer to this comment on feature filtering. https://www.sc-best-practices.org/chromatin_accessibility/quality_control.html#filtering-features
In the preprint, we tested feature selection using random or episcanpy. I would recommend random feature selection, and for exploration at least a multiple of the number of cells e.g. 2x-3x.Thank you!
Hello,
How do you decide on
n_sample_cells
&n_sample_peaks
in the Mouse pancreatic endocrinogenesis (scATAC-seq) Tutorial? Do you have any benchmarks?