Open lucygarner opened 2 years ago
This is the log for cellSNP-lite in case that helps.
[I::main] start time: 2022-03-07 10:57:55
[W::check_args] Max depth set to maximum value (2147483647)
[I::main] loading the VCF file for given SNPs ...
[I::main] fetching 7352497 candidate variants ...
[I::main] mode 1a: fetch given SNPs in 41622 single cells.
[I::csp_fetch_core][Thread-2] 2.00% SNPs processed.
[I::csp_fetch_core][Thread-3] 2.00% SNPs processed.
[I::csp_fetch_core][Thread-5] 2.00% SNPs processed.
...
[I::csp_fetch_core][Thread-9] 90.00% SNPs processed.
[I::csp_fetch_core][Thread-9] 92.00% SNPs processed.
[I::csp_fetch_core][Thread-9] 94.00% SNPs processed.
[I::csp_fetch_core][Thread-9] 96.00% SNPs processed.
[I::csp_fetch_core][Thread-9] 98.00% SNPs processed.
[I::main] All Done!
[I::main] end time: 2022-03-08 10:09:17
[I::main] time spent: 83482 seconds.
Hi Lucy,
Thanks for the issue. Your dataset indeed looks relatively large. I wonder if the memory is a bottleneck. You check the memory usage by free -h
.
If it is the case, you can change your command line to -p 1
by only using one CPU.
Another is that you may set a more stringent cutoff on --minCOUNT
, e.g., with 30 or 100 in cellsnp. It looks you already have much more than enough variants. Probably, this is not the fastest strategy to sort it out, as you need to re-run cellsnp.
Yuanhua
Hi @huangyh09,
Thank you for the quick response. I am running the command on a large compute cluster but maybe I didn't specify enough memory. How much memory would you recommend specifying?
Why do you suggest to use only one CPU (-p 1
)? Would using more CPUs not make it quicker?
If this does not work, I will try increasing the --minCOUNT
threshold for cellSNP
.
Best wishes, Lucy
I see. Probably you could start with specifying 50GB memory. I guess it won't use more than 100GB. Another major factor for memory usage is the n_CPUs it uses, as n copies for data will be used, one for each sub-processor. So you may use -p 4
as a safer start instead of 30.
Yuanhua
Hi,
I have some single-cell RNA-seq data for which I don't have genotype information.
I ran cellSNP-lite on a merged BAM file containing all of the donors to genotype the single cells as follows:
I am now running Vireo as follows:
However, it has been running for three days and still hasn't finished. I have spoken to others who have used Vireo and they mentioned that it was fast, so I'm not sure if I'm doing something wrong?
This is the log message so far:
Many thanks for the help.
Best wishes, Lucy