hukai916 / scATACpipe

MIT License
21 stars 2 forks source link

barcode whitelist #4

Open priscilla-glenn opened 1 year ago

priscilla-glenn commented 1 year ago

Hi, I am trying to analyze some newly generated scATAC data from Illumina and may have more than 20,000 cells which cellranger-atac count is unable to properly analyze. I would like to use the default preprocess option in scATACpipe, but in the config generator there is an option for a whitelist barcode folder. How should I go about generating a proper barcode folder/file to use with scATACpipe. If I try to use the 10x genomics preprocess, which does not require this input, will I still run into the max cell issue?

Thank you.

hukai916 commented 1 year ago

Hi @priscilla-glenn,

There is a whitelist barcode folder located at assets/whitelist_barcode. This folder contains commonly used whitelist barcode files, and scATACpipe automatically scans each file in the folder to determine which one to use. If your whitelist file is not in that folder, you can add it to the folder. Typically, the whitelist file can be obtained from your sequencing provider.

We have tested cellranger-atac with 10k cells and did not encounter any issues. Can you please provide more information about the error message you are seeing with 20k cells? This will help us diagnose the issue and provide a solution.

priscilla-glenn commented 1 year ago

Oh, excellent. Thank you! I'll just run it then using assets/whitelist_barcode.

Also, the error message was from when I ran cellranger-atac count from 10x genomics instead of using scATACpipe. It counted 20,096 cells and said it was too many cells since "Estimated number of cells is expected to be under 10,000 and more than 20,000 cells cannot be called. A high value might indicate an overloading of cells, a problem during library preparation, or unexpected behavior in the cell calling algorithm." And so if I wasn't using the default preprocessing and using 10x genomics instead, I was curious if it would also run into this issue since it is using cellranger-atac count. My current plan though, now that I understand the barcode folder, is to use preprocess_default.