huangyh09 / brie

BRIE: Bayesian Regression for Isoform Estimate in Single Cells
https://brie.readthedocs.io
Apache License 2.0
41 stars 15 forks source link

`brie-count` fails with `KeyError '1'` #40

Closed cartal closed 2 years ago

cartal commented 2 years ago

Hi,

Thank you for all the work and effort that went into BRIE/BRIE2. Very nice tool.

I am processing a SS2 dataset with brie-count as follows:

brie-count -a SE.gold.gtf -S samples.tsv -o test -p 3

This runs for a while with samtools-related warnings that look like this:

"E::sam_hrecs_error] Malformed key:value pair at line 69: "@RG  ID:CM387
Cannot fetch reads in region: chr16:92613599-92668812

and ultimately fails with:

Cannot fetch reads in region: chrX:167304929-167330099
[BRIE2] [====================] 100.0% cells done in 6.6 sec.
[BRIE2] save matrix into h5ad ...
Traceback (most recent call last):
  File "/Users/ctl/anaconda3/envs/scbrie2/bin/brie-count", line 8, in <module>
    sys.exit(main())
  File "/Users/ctl/anaconda3/envs/scbrie2/lib/python3.9/site-packages/brie/bin/count.py", line 206, in main
    count(options.gff_file, options.samList_file, options.out_dir,
  File "/Users/ctl/anaconda3/envs/scbrie2/lib/python3.9/site-packages/brie/bin/count.py", line 156, in count
    adata = convert_to_annData(Rmat_dict=Rmat_dict,
  File "/Users/ctl/anaconda3/envs/scbrie2/lib/python3.9/site-packages/brie/utils/io_utils.py", line 20, in convert_to_annData
    X = Rmat['1'] + Rmat['2'] + Rmat['3']
KeyError: '1'

It looks like the anndata can't be made because there are problems with the BAM file, but the BAM files seem fine.

Any help with this would be much appreciated.

huangyh09 commented 2 years ago

Hi, thanks for the issue.

The issue happens when there expected layers (i.e., count matrices, for isoform1, ...) don't exist. This is likely that the processing of the bam file is unsuccessful, particularly when receiving warnings on "Cannot fetch reads in region:". Maybe you can first check your bam file. The different formats of chromosomes (e.g., chr16 and 16) are normally handled automatically, but may still be worth double-checking.

Also, did you see any output file, e.g., "read_count.mtx"? If so and it's empty, then it means the processing of the bam file fails. BTW, how many cells are there in your "samples.tsv" file?

Yuanhua