Open CowanCS1 opened 4 years ago
We've been running this code in production under python3 for months and haven't found such issue. Could you provide a minimal data example so that we can replicate it?
Ran into the exact same issue today. Loading the matrix via matrix = CountMatrix.from_anndata(adata)
, stores the cell barcodes as bytes in matrix.bcs
(like b'AAACCTGAGAAACCAT-1'
. Since I wasn't aware of that, I supplied the orig_cell_bcs
argument as a list of strings (e.g. AAACCTGAGAAACCAT-1
),which just returns None.
Changing the argument to bytes works, it's just odd/unexpected behavior.
The problem is coming from here due to the bytes-dtype:
bcs = np.array(bcs, dtype='S', copy=False)
Thanks for putting this package together :)
Dear @redst4r,
I am right now in the same situation as you, but probably with less knowledge. Since I am using anndata data, I am directing the question to you but of course @nh3 feel free to help me out here. So far I have:
matrix = CountMatrix.from_anndata(adata)
barcodes = adata.obs_names.values.astype(bytes)
And then:
a = find_nonambient_barcodes(
matrix, # Full expression matrix
orig_cell_bcs=barcodes, # (iterable of str): Strings of initially-called cell barcodes
min_umi_frac_of_median=0.01,
min_umis_nonambient=500,
max_adj_pvalue=0.01
)
But all I get is:
Median UMIs of initial cell calls: 1.0
Min UMIs: 500
And a as a NoneType object
...any clues would be very helpful.
Thanks
A
odd, that's pretty much what I did, i.e. converting the orig_cell_bcs
to bytes.
Can you check if the barcodes stored in that matrix
object actually overlap with what you supply via orig_cell_bcs
?
Hi,
The function
find_nonambient_barcodes
lists the input requirement:orig_cell_bcs (iterable of str): Strings of initially-called cell barcodes.
However, because the default meaning of "str" changed between python2 (bytes) and python 3 (unicode) this broke in python3. Worse, with a unicode input it appears that no good barcodes were identified in the original list and the codes hits an uninformative "return None".
Suggestions for fixing it are 1) Casting each string to bytes before the "set" type casting
orig_cell_bcs = tuple( i.encode('ascii') for i in orig_cell_bcs )
2) Provide an informative exceptionThanks for making this code more accessible!