Closed mbahin closed 3 years ago
Hi @mbahin ,
May I ash how did you install vpolo ? If you are installing it through conda try installing it in a new environment with python 3.8.
re: low confidence barcodes, alevin should generate two files quants_mat_rows.txt
and whitelist.txt
, all the barcodes in the first file but not in the second are the low confidence barcodes.
Hope it helps!
Thanks @k3yavi for the quick answer.
I've installed it through conda and in a Python 3.6 environment, so I can try with a 3.8.
Installing vpolo was actually only a way to get to this split low/high confidence barcodes.
What I don't understand then is that, on the standard output, alevin wrote that I had 334 barcodes (which I find in quants_mat_rows.txt
) with 201 low confidence ones. But in whitelist.txt
, I only find 102 barcodes.
Cheers, Mathieu
Hi @mbahin ,
Can you share the log ? Basically Alevin performs whitelisting at multiple level. The logs on standard output is for the first rough whitelisting using knee based thresholding on the cdf of the cb frequency. The whitelist file is after another level of whitelisting, which can further filter / swap the cbs.
Here is the log:
[2020-09-22 15:24:18.089] [alevinLog] [info] Found 179839 transcripts(+194 decoys, +44 short and +0 duplicate names in the index)
[2020-09-22 15:24:18.259] [alevinLog] [info] Filled with 179883 txp to gene entries
[2020-09-22 15:24:18.286] [alevinLog] [info] Found all transcripts to gene mappings
[2020-09-22 15:24:18.313] [alevinLog] [info] Processing barcodes files (if Present)
[2020-09-22 15:25:35.437] [alevinLog] [info] Done barcode density calculation.
[2020-09-22 15:25:35.437] [alevinLog] [info] # Barcodes Used: 41789122 / 41789122.
[2020-09-22 15:25:40.841] [alevinLog] [info] Knee found left boundary at 1115
[2020-09-22 15:25:40.920] [alevinLog] [info] Gauss Corrected Boundary at 133
[2020-09-22 15:25:40.920] [alevinLog] [info] Learned InvCov: 125.325 normfactor: 70.3073
[2020-09-22 15:25:40.920] [alevinLog] [info] Total 334(has 201 low confidence) barcodes
[2020-09-22 15:25:41.116] [alevinLog] [info] Done True Barcode Sampling
[2020-09-22 15:25:41.182] [alevinLog] [info] Total 12.3872% reads will be thrown away because of noisy Cellular barcodes.
[2020-09-22 15:25:41.202] [alevinLog] [info] Done populating Z matrix
[2020-09-22 15:25:41.205] [alevinLog] [info] Total 6780 CB got sequence corrected
[2020-09-22 15:25:41.206] [alevinLog] [info] Done indexing Barcodes
[2020-09-22 15:25:41.206] [alevinLog] [info] Total Unique barcodes found: 254108
[2020-09-22 15:25:41.206] [alevinLog] [info] Used Barcodes except Whitelist: 6526
[2020-09-22 15:25:41.231] [alevinLog] [info] Done with Barcode Processing; Moving to Quantify
[2020-09-22 15:25:41.231] [alevinLog] [info] parsing read library format
[2020-09-22 15:31:41.621] [alevinLog] [info] Starting optimizer
[2020-09-22 15:31:42.366] [alevinLog] [warning] mrna file not provided; using is 1 less feature for whitelisting
[2020-09-22 15:31:42.366] [alevinLog] [warning] rrna file not provided; using is 1 less feature for whitelisting
[2020-09-22 15:31:44.735] [alevinLog] [info] Total 429829.00 UMI after deduplicating.
[2020-09-22 15:31:44.735] [alevinLog] [info] Total 2255592 BiDirected Edges.
[2020-09-22 15:31:44.735] [alevinLog] [info] Total 509951 UniDirected Edges.
[2020-09-22 15:31:44.749] [alevinLog] [info] Clearing EqMap; Might take some time.
[2020-09-22 15:31:44.870] [alevinLog] [info] Starting white listing of 333 cells
[2020-09-22 15:31:44.870] [alevinLog] [info] Starting to make feature Matrix
[2020-09-22 15:31:44.871] [alevinLog] [info] Done making feature Matrix
[2020-09-22 15:31:44.934] [alevinLog] [info] Finished white listing
[2020-09-22 15:31:44.954] [alevinLog] [info] Starting dumping cell v gene counts in mtx format
[2020-09-22 15:31:45.288] [alevinLog] [info] Finished dumping counts into mtx
[2020-09-22 15:31:45.290] [alevinLog] [info] Finished optimizer
Ok it makes sense if there are more steps before final whitelisting.
Can you provide a little explanation on the knee and Gauss corrected boundary please (Or point to a link where I can find more, I couldn't find)? We don't have a plot to see that right? From what I understand, the knee selected 1115 barcodes but the gauss corrected found only 133 (and at some point there was 201 of bad quality). The 12% of thrown away reads corresponds to all the filtered out barcodes (keeping only the 334 barcodes)? There was 254108 barcodes found, 6780 were sequence corrected and amongst them, 6526 were corrected and matched a whitelist barcode?
Cheers, Mathieu
I'm getting this sce error in a Python 3.8.6 environment
from vpolo.alevin import parser
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/anaconda3/lib/python3.8/site-packages/vpolo/alevin/parser.py", line 9, in <module>
import sce
File "/usr/local/anaconda3/lib/python3.8/site-packages/sce/__init__.py", line 1, in <module>
from .sce import *
ModuleNotFoundError: No module named 'sce.sce'
vpolo was installed using conda install -c bioconda vpolo
I tried installing Rust (conda install -c conda-forge rust
)
And I'm still getting the same error trying to import the parser.
I am guessing you reinstalled the vpolo using github as well ? Another option is to use fishpond , unless python is a requirement ?
I hadn't actually since the conda forge version and the github version appeared to be the same version. I did just reinstall from github directly and now its working, so it looks like some of the dependencies might not be properly specified in the conda version.
Also, yes, I use fishpond when working in R, but I'm working on implementing the Alevin Single Cell Velocity tutorial and since scvelo is python based figured it would be easier to do the downstream work entirely in python. Thanks again!
Hi,
I would like to explore the binary tier matrix from alevin results so I wanted to follow the procedure here but when I try to import parser, I get the following message:
ModuleNotFoundError: No module named 'sce.sce'
I can't find any info on this sce lib.
By the way, I have a question that you might be able to answer. When I run alevin, I get a number of cells from which certain have a low confidence. Is it possible to get the list of these low conficence barcodes? (I was actually hoping to find info about that in the tier matrix but not sure!).
Cheers, Mathieu