timoast / sinto

Tools for single-cell data processing
https://timoast.github.io/sinto/
MIT License
112 stars 24 forks source link

Key Error 'RG' with filterbarcodes #52

Closed alwhiteh closed 1 year ago

alwhiteh commented 1 year ago

Hello,

I have some scATAC data from which I am trying to generate pseudobulk files using text files of cell barcodes. The fastq files were aligned with bowtie2 and then converted into .bams and sorted using samtools.

I am encountering the following error:

sinto filterbarcodes -b f1.sorted.bam -c fibroblast_cells.txt --outdir f1_CFs.bam -p 1
Function run_filterbarcodes called with the following arguments:

bam f1.sorted.bam
cells   fibroblast_cells.txt
trim_suffix False
nproc   1
barcode_regex   None
barcodetag  CB
outdir  f1_CFs.bam
sam False
func    <function run_filterbarcodes at 0x110ca9550>
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/Users/alexwhitehead/miniconda3/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/Users/alexwhitehead/miniconda3/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/Users/alexwhitehead/miniconda3/lib/python3.9/site-packages/sinto/filterbarcodes.py", line 25, in _iterate_reads
    newhead = dict((k, header[k]) for k in ("HD", "SQ", "RG"))
  File "/Users/alexwhitehead/miniconda3/lib/python3.9/site-packages/sinto/filterbarcodes.py", line 25, in <genexpr>
    newhead = dict((k, header[k]) for k in ("HD", "SQ", "RG"))
KeyError: 'RG'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/alexwhitehead/miniconda3/bin/sinto", line 8, in <module>
    sys.exit(main())
  File "/Users/alexwhitehead/miniconda3/lib/python3.9/site-packages/sinto/arguments.py", line 472, in main
    options.func(options)
  File "/Users/alexwhitehead/miniconda3/lib/python3.9/site-packages/sinto/utils.py", line 23, in wrapper
    func(args)
  File "/Users/alexwhitehead/miniconda3/lib/python3.9/site-packages/sinto/cli.py", line 17, in run_filterbarcodes
    filterbarcodes.filterbarcodes(
  File "/Users/alexwhitehead/miniconda3/lib/python3.9/site-packages/sinto/filterbarcodes.py", line 111, in filterbarcodes
    idents = p.map_async(
  File "/Users/alexwhitehead/miniconda3/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
KeyError: 'RG'

I am unsure if this is caused by missing the read group portion of the header - when I ran

samtools view -H f1.bam

I got the following output:

@HD VN:1.0  SO:coordinate
@SQ SN:chr1 LN:248956422
@SQ SN:chr2 LN:242193529
@SQ SN:chr3 LN:198295559
@SQ SN:chr4 LN:190214555
@SQ SN:chr5 LN:181538259
@SQ SN:chr6 LN:170805979
@SQ SN:chr7 LN:159345973
@SQ SN:chr8 LN:145138636
@SQ SN:chr9 LN:138394717
@SQ SN:chr10    LN:133797422
@SQ SN:chr11    LN:135086622
@SQ SN:chr12    LN:133275309
@SQ SN:chr13    LN:114364328
@SQ SN:chr14    LN:107043718
@SQ SN:chr15    LN:101991189
@SQ SN:chr16    LN:90338345
@SQ SN:chr17    LN:83257441
@SQ SN:chr18    LN:80373285
@SQ SN:chr19    LN:58617616
@SQ SN:chr20    LN:64444167
@SQ SN:chr21    LN:46709983
@SQ SN:chr22    LN:50818468
@SQ SN:chrX LN:156040895
@SQ SN:chrY LN:57227415
@SQ SN:chrM LN:16569
@SQ SN:chr1_KI270706v1_random   LN:175055
@SQ SN:chr1_KI270707v1_random   LN:32032
@SQ SN:chr1_KI270708v1_random   LN:127682
@SQ SN:chr1_KI270709v1_random   LN:66860
@SQ SN:chr1_KI270710v1_random   LN:40176
@SQ SN:chr1_KI270711v1_random   LN:42210
@SQ SN:chr1_KI270712v1_random   LN:176043
@SQ SN:chr1_KI270713v1_random   LN:40745
@SQ SN:chr1_KI270714v1_random   LN:41717
@SQ SN:chr2_KI270715v1_random   LN:161471
@SQ SN:chr2_KI270716v1_random   LN:153799
@SQ SN:chr3_GL000221v1_random   LN:155397
@SQ SN:chr4_GL000008v2_random   LN:209709
@SQ SN:chr5_GL000208v1_random   LN:92689
@SQ SN:chr9_KI270717v1_random   LN:40062
@SQ SN:chr9_KI270718v1_random   LN:38054
@SQ SN:chr9_KI270719v1_random   LN:176845
@SQ SN:chr9_KI270720v1_random   LN:39050
@SQ SN:chr11_KI270721v1_random  LN:100316
@SQ SN:chr14_GL000009v2_random  LN:201709
@SQ SN:chr14_GL000225v1_random  LN:211173
@SQ SN:chr14_KI270722v1_random  LN:194050
@SQ SN:chr14_GL000194v1_random  LN:191469
@SQ SN:chr14_KI270723v1_random  LN:38115
@SQ SN:chr14_KI270724v1_random  LN:39555
@SQ SN:chr14_KI270725v1_random  LN:172810
@SQ SN:chr14_KI270726v1_random  LN:43739
@SQ SN:chr15_KI270727v1_random  LN:448248
@SQ SN:chr16_KI270728v1_random  LN:1872759
@SQ SN:chr17_GL000205v2_random  LN:185591
@SQ SN:chr17_KI270729v1_random  LN:280839
@SQ SN:chr17_KI270730v1_random  LN:112551
@SQ SN:chr22_KI270731v1_random  LN:150754
@SQ SN:chr22_KI270732v1_random  LN:41543
@SQ SN:chr22_KI270733v1_random  LN:179772
@SQ SN:chr22_KI270734v1_random  LN:165050
@SQ SN:chr22_KI270735v1_random  LN:42811
@SQ SN:chr22_KI270736v1_random  LN:181920
@SQ SN:chr22_KI270737v1_random  LN:103838
@SQ SN:chr22_KI270738v1_random  LN:99375
@SQ SN:chr22_KI270739v1_random  LN:73985
@SQ SN:chrY_KI270740v1_random   LN:37240
@SQ SN:chrUn_KI270302v1 LN:2274
@SQ SN:chrUn_KI270304v1 LN:2165
@SQ SN:chrUn_KI270303v1 LN:1942
@SQ SN:chrUn_KI270305v1 LN:1472
@SQ SN:chrUn_KI270322v1 LN:21476
@SQ SN:chrUn_KI270320v1 LN:4416
@SQ SN:chrUn_KI270310v1 LN:1201
@SQ SN:chrUn_KI270316v1 LN:1444
@SQ SN:chrUn_KI270315v1 LN:2276
@SQ SN:chrUn_KI270312v1 LN:998
@SQ SN:chrUn_KI270311v1 LN:12399
@SQ SN:chrUn_KI270317v1 LN:37690
@SQ SN:chrUn_KI270412v1 LN:1179
@SQ SN:chrUn_KI270411v1 LN:2646
@SQ SN:chrUn_KI270414v1 LN:2489
@SQ SN:chrUn_KI270419v1 LN:1029
@SQ SN:chrUn_KI270418v1 LN:2145
@SQ SN:chrUn_KI270420v1 LN:2321
@SQ SN:chrUn_KI270424v1 LN:2140
@SQ SN:chrUn_KI270417v1 LN:2043
@SQ SN:chrUn_KI270422v1 LN:1445
@SQ SN:chrUn_KI270423v1 LN:981
@SQ SN:chrUn_KI270425v1 LN:1884
@SQ SN:chrUn_KI270429v1 LN:1361
@SQ SN:chrUn_KI270442v1 LN:392061
@SQ SN:chrUn_KI270466v1 LN:1233
@SQ SN:chrUn_KI270465v1 LN:1774
@SQ SN:chrUn_KI270467v1 LN:3920
@SQ SN:chrUn_KI270435v1 LN:92983
@SQ SN:chrUn_KI270438v1 LN:112505
@SQ SN:chrUn_KI270468v1 LN:4055
@SQ SN:chrUn_KI270510v1 LN:2415
@SQ SN:chrUn_KI270509v1 LN:2318
@SQ SN:chrUn_KI270518v1 LN:2186
@SQ SN:chrUn_KI270508v1 LN:1951
@SQ SN:chrUn_KI270516v1 LN:1300
@SQ SN:chrUn_KI270512v1 LN:22689
@SQ SN:chrUn_KI270519v1 LN:138126
@SQ SN:chrUn_KI270522v1 LN:5674
@SQ SN:chrUn_KI270511v1 LN:8127
@SQ SN:chrUn_KI270515v1 LN:6361
@SQ SN:chrUn_KI270507v1 LN:5353
@SQ SN:chrUn_KI270517v1 LN:3253
@SQ SN:chrUn_KI270529v1 LN:1899
@SQ SN:chrUn_KI270528v1 LN:2983
@SQ SN:chrUn_KI270530v1 LN:2168
@SQ SN:chrUn_KI270539v1 LN:993
@SQ SN:chrUn_KI270538v1 LN:91309
@SQ SN:chrUn_KI270544v1 LN:1202
@SQ SN:chrUn_KI270548v1 LN:1599
@SQ SN:chrUn_KI270583v1 LN:1400
@SQ SN:chrUn_KI270587v1 LN:2969
@SQ SN:chrUn_KI270580v1 LN:1553
@SQ SN:chrUn_KI270581v1 LN:7046
@SQ SN:chrUn_KI270579v1 LN:31033
@SQ SN:chrUn_KI270589v1 LN:44474
@SQ SN:chrUn_KI270590v1 LN:4685
@SQ SN:chrUn_KI270584v1 LN:4513
@SQ SN:chrUn_KI270582v1 LN:6504
@SQ SN:chrUn_KI270588v1 LN:6158
@SQ SN:chrUn_KI270593v1 LN:3041
@SQ SN:chrUn_KI270591v1 LN:5796
@SQ SN:chrUn_KI270330v1 LN:1652
@SQ SN:chrUn_KI270329v1 LN:1040
@SQ SN:chrUn_KI270334v1 LN:1368
@SQ SN:chrUn_KI270333v1 LN:2699
@SQ SN:chrUn_KI270335v1 LN:1048
@SQ SN:chrUn_KI270338v1 LN:1428
@SQ SN:chrUn_KI270340v1 LN:1428
@SQ SN:chrUn_KI270336v1 LN:1026
@SQ SN:chrUn_KI270337v1 LN:1121
@SQ SN:chrUn_KI270363v1 LN:1803
@SQ SN:chrUn_KI270364v1 LN:2855
@SQ SN:chrUn_KI270362v1 LN:3530
@SQ SN:chrUn_KI270366v1 LN:8320
@SQ SN:chrUn_KI270378v1 LN:1048
@SQ SN:chrUn_KI270379v1 LN:1045
@SQ SN:chrUn_KI270389v1 LN:1298
@SQ SN:chrUn_KI270390v1 LN:2387
@SQ SN:chrUn_KI270387v1 LN:1537
@SQ SN:chrUn_KI270395v1 LN:1143
@SQ SN:chrUn_KI270396v1 LN:1880
@SQ SN:chrUn_KI270388v1 LN:1216
@SQ SN:chrUn_KI270394v1 LN:970
@SQ SN:chrUn_KI270386v1 LN:1788
@SQ SN:chrUn_KI270391v1 LN:1484
@SQ SN:chrUn_KI270383v1 LN:1750
@SQ SN:chrUn_KI270393v1 LN:1308
@SQ SN:chrUn_KI270384v1 LN:1658
@SQ SN:chrUn_KI270392v1 LN:971
@SQ SN:chrUn_KI270381v1 LN:1930
@SQ SN:chrUn_KI270385v1 LN:990
@SQ SN:chrUn_KI270382v1 LN:4215
@SQ SN:chrUn_KI270376v1 LN:1136
@SQ SN:chrUn_KI270374v1 LN:2656
@SQ SN:chrUn_KI270372v1 LN:1650
@SQ SN:chrUn_KI270373v1 LN:1451
@SQ SN:chrUn_KI270375v1 LN:2378
@SQ SN:chrUn_KI270371v1 LN:2805
@SQ SN:chrUn_KI270448v1 LN:7992
@SQ SN:chrUn_KI270521v1 LN:7642
@SQ SN:chrUn_GL000195v1 LN:182896
@SQ SN:chrUn_GL000219v1 LN:179198
@SQ SN:chrUn_GL000220v1 LN:161802
@SQ SN:chrUn_GL000224v1 LN:179693
@SQ SN:chrUn_KI270741v1 LN:157432
@SQ SN:chrUn_GL000226v1 LN:15008
@SQ SN:chrUn_GL000213v1 LN:164239
@SQ SN:chrUn_KI270743v1 LN:210658
@SQ SN:chrUn_KI270744v1 LN:168472
@SQ SN:chrUn_KI270745v1 LN:41891
@SQ SN:chrUn_KI270746v1 LN:66486
@SQ SN:chrUn_KI270747v1 LN:198735
@SQ SN:chrUn_KI270748v1 LN:93321
@SQ SN:chrUn_KI270749v1 LN:158759
@SQ SN:chrUn_KI270750v1 LN:148850
@SQ SN:chrUn_KI270751v1 LN:150742
@SQ SN:chrUn_KI270752v1 LN:27745
@SQ SN:chrUn_KI270753v1 LN:62944
@SQ SN:chrUn_KI270754v1 LN:40191
@SQ SN:chrUn_KI270755v1 LN:36723
@SQ SN:chrUn_KI270756v1 LN:79590
@SQ SN:chrUn_KI270757v1 LN:71251
@SQ SN:chrUn_GL000214v1 LN:137718
@SQ SN:chrUn_KI270742v1 LN:186739
@SQ SN:chrUn_GL000216v2 LN:176608
@SQ SN:chrUn_GL000218v1 LN:161147
@SQ SN:chrEBV   LN:171823
@PG ID:bowtie2  PN:bowtie2  VN:2.4.4    CL:"/Users/alexwhitehead/miniconda3/bin/bowtie2-align-s --wrapper basic-0 -p 8 -X2000 --local -x /Users/alexwhitehead/Applications/bowtie2_index/GRCh38_noalt_as -1 /Users/alexwhitehead/AW_ATAC/fetal_heart_sc_fastq/fastq/SRR11692126_S1_L001_R1_001.fastq.gz -2 /Users/alexwhitehead/AW_ATAC/fetal_heart_sc_fastq/fastq/SRR11692126_S1_L001_R2_001.fastq.gz"
@PG ID:samtools PN:samtools PP:bowtie2  VN:1.15.1   CL:samtools view -H f1.sorted.bam

Do I need to edit the header somehow or is this due to another issue?

Thanks,

Alex

timoast commented 1 year ago

Thanks for reporting, this should now be fixed on the develop branch. Can you try installing from develop and let me know if you still see the error?

alwhiteh commented 1 year ago

Hi Tim,

I ran the code again and got the following output:

Function run_filterbarcodes called with the following arguments:

bam f1.sorted.bam
cells   fibroblast_cells.txt
trim_suffix False
nproc   1
barcode_regex   None
barcodetag  CB
outdir  f1_CFs.bam
sam False
func    <function run_filterbarcodes at 0x1104c1ca0>
[E::hts_open_format] Failed to open file "f1_CFs.bam/CF_HKOMN4" : No such file or directory
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/Users/alexwhitehead/miniconda3/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/Users/alexwhitehead/miniconda3/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/Users/alexwhitehead/miniconda3/lib/python3.9/site-packages/sinto/filterbarcodes.py", line 32, in _iterate_reads
    bamlist = [pysam.AlignmentFile(x, filemode, header=newhead) for x in filelist]
  File "/Users/alexwhitehead/miniconda3/lib/python3.9/site-packages/sinto/filterbarcodes.py", line 32, in <listcomp>
    bamlist = [pysam.AlignmentFile(x, filemode, header=newhead) for x in filelist]
  File "pysam/libcalignmentfile.pyx", line 751, in pysam.libcalignmentfile.AlignmentFile.__cinit__
  File "pysam/libcalignmentfile.pyx", line 924, in pysam.libcalignmentfile.AlignmentFile._open
FileNotFoundError: [Errno 2] could not open alignment file `f1_CFs.bam/CF_HKOMN4`: No such file or directory
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/alexwhitehead/miniconda3/bin/sinto", line 8, in <module>
    sys.exit(main())
  File "/Users/alexwhitehead/miniconda3/lib/python3.9/site-packages/sinto/arguments.py", line 472, in main
    options.func(options)
  File "/Users/alexwhitehead/miniconda3/lib/python3.9/site-packages/sinto/utils.py", line 24, in wrapper
    func(args)
  File "/Users/alexwhitehead/miniconda3/lib/python3.9/site-packages/sinto/cli.py", line 17, in run_filterbarcodes
    filterbarcodes.filterbarcodes(
  File "/Users/alexwhitehead/miniconda3/lib/python3.9/site-packages/sinto/filterbarcodes.py", line 112, in filterbarcodes
    idents = p.map_async(
  File "/Users/alexwhitehead/miniconda3/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
FileNotFoundError: [Errno 2] could not open alignment file `f1_CFs.bam/CF_HKOMN4`: No such file or directory

-Alex

timoast commented 1 year ago

Should be fixed now, could you check again?

alwhiteh commented 1 year ago

I'm getting a similar error still:

Function run_filterbarcodes called with the following arguments:

bam f1.sorted.bam
cells   fibroblast_cells.txt
trim_suffix False
nproc   1
barcode_regex   None
barcodetag  CB
outdir  f1_CFs.bam
sam False
func    <function run_filterbarcodes at 0x1144dcca0>
[E::hts_open_format] Failed to open file "f1_CFs.bam/CF_MGJNVK" : No such file or directory
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/Users/alexwhitehead/miniconda3/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/Users/alexwhitehead/miniconda3/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/Users/alexwhitehead/miniconda3/lib/python3.9/site-packages/sinto/filterbarcodes.py", line 32, in _iterate_reads
    bamlist = [pysam.AlignmentFile(x, filemode, header=newhead) for x in filelist]
  File "/Users/alexwhitehead/miniconda3/lib/python3.9/site-packages/sinto/filterbarcodes.py", line 32, in <listcomp>
    bamlist = [pysam.AlignmentFile(x, filemode, header=newhead) for x in filelist]
  File "pysam/libcalignmentfile.pyx", line 751, in pysam.libcalignmentfile.AlignmentFile.__cinit__
  File "pysam/libcalignmentfile.pyx", line 924, in pysam.libcalignmentfile.AlignmentFile._open
FileNotFoundError: [Errno 2] could not open alignment file `f1_CFs.bam/CF_MGJNVK`: No such file or directory
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/alexwhitehead/miniconda3/bin/sinto", line 8, in <module>
    sys.exit(main())
  File "/Users/alexwhitehead/miniconda3/lib/python3.9/site-packages/sinto/arguments.py", line 472, in main
    options.func(options)
  File "/Users/alexwhitehead/miniconda3/lib/python3.9/site-packages/sinto/utils.py", line 24, in wrapper
    func(args)
  File "/Users/alexwhitehead/miniconda3/lib/python3.9/site-packages/sinto/cli.py", line 17, in run_filterbarcodes
    filterbarcodes.filterbarcodes(
  File "/Users/alexwhitehead/miniconda3/lib/python3.9/site-packages/sinto/filterbarcodes.py", line 112, in filterbarcodes
    idents = p.map_async(
  File "/Users/alexwhitehead/miniconda3/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
FileNotFoundError: [Errno 2] could not open alignment file `f1_CFs.bam/CF_MGJNVK`: No such file or directory
timoast commented 1 year ago

You pulled from the develop branch and installed again?

alwhiteh commented 1 year ago

I used

pip install git+https://github.com/timoast/sinto.git@develop

Is this not correct?

timoast commented 1 year ago

Yep that should work

I just increased the dev version, can you install again and run sinto -v to check the version is 0.8.3_1 and then check if you're still getting that error?

alwhiteh commented 1 year ago

Hi Tim, this seemed to fix the error! My output file was empty, but I'll troubleshoot a bit more and reopen if I'm confident it's not an issue with my files/formatting. Thanks again!