Open leeyoyohku opened 3 years ago
The scTE have moved to: https://github.com/JiekaiLab/scTE
you can set the: -CB CB -UMI UB, which will works for your bam file
Thank @jphe for your reply.
I thought -CB and -UMI only take TRUE/FALSE as input. As I tried the command as you suggested
scTE -i D0_1Aligned.sortedByCoord.out.bam -o D0_1 -g hg38 -x ~/yoyo/0_reference/scTEindex.idx.exclusive.idx -p 10 -CB CB -UMI UB --hdf5 True
The error occurs:
scTE: error: argument -CB: invalid choice: 'CB' (choose from 'True', 'False')
The scTE have moved to: https://github.com/JiekaiLab/scTE, you need download the latest version there
Thank you! As I installed the new version with the following instructions under a python=3.7.10 environment,
$ git clone https://github.com/jphe/scTE.git $ cd scTE $ python setup.py install
I still encountered the similar error but with 4 lines of DEBUG this time, seen as below:
scTE -i new/D0_2Aligned.sortedByCoord.out.bam -o D0_1 -g hg38 -x ~/yoyo/0_reference/scTE_human.exclusive.idx -p 10 -CB CB -UMI UB --hdf5 True
DEBUG : Creating converter from 7 to 5
DEBUG : Creating converter from 5 to 7
DEBUG : Creating converter from 7 to 5
DEBUG : Creating converter from 5 to 7
usage: scTE [-h] [--min_genes INT] [--min_counts INT] [--expect-cells INT]
[-f [input file format]] [-CB [{True,False}]]
[-UMI [{True,False}]] [--keeptmp [{True,False}]]
[--hdf5 [{True,False}]] [-p INT] [-v] -i INPUT [INPUT ...] -x
ANNOGLB [ANNOGLB ...] -g [genome] -o [OUT]
scTE: error: argument -CB: invalid choice: 'CB' (choose from 'True', 'False')
Much appreciated for your timely help.
The newest scTE have moved to: https://github.com/JiekaiLab/scTE, you need download the latest version under that repository
git clone https://github.com/JiekaiLab/scTE.git, then re-install it
Ohhhh sorry that I didn't revise the git link cuz I only followed the installation instruction on the new GitHub page. Now it has taken my -CB input, however, it seems not able to recognise my CB even though they present:
$ scTE -i new/D0_2Aligned.sortedByCoord.out.bam -o D0_2 -x ~/yoyo/0_reference/scTE_human.exclusive.idx -p 10 -CB CB -UMI UB --hdf5 True
DEBUG : Creating converter from 7 to 5
DEBUG : Creating converter from 5 to 7
DEBUG : Creating converter from 7 to 5
DEBUG : Creating converter from 5 to 7
INFO : Parameter list:
Sample = D0_1
Reference annotation index = /home/yoyo/yoyo/0_reference/scTE_human.exclusive.idx
Minimum number of genes required = 200
Minimum number of counts required = None
Number of threads = 10
INFO : Loading the genome annotation index... 2021-03-22 03:38:51
INFO : Loaded '/home/yoyo/yoyo/0_reference/scTE_human.exclusive.idx' binary file with 5589235 items
['1', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '2', '20', '21', '22', '3', '4', '5', '6', '7', '8', '9', 'GL000009.2', 'GL000194.1', 'GL000195.1', 'GL000205.2', 'GL000209.2', 'GL000213.1', 'GL000218.1', 'GL000219.1', 'GL000220.1', 'GL000250.2', 'GL000251.2', 'GL000252.2', 'GL000253.2', 'GL000254.2', 'GL000255.2', 'GL000256.2', 'GL000257.2', 'GL000258.2', 'GL339449.2', 'GL383518.1', 'GL383519.1', 'GL383520.2', 'GL383521.1', 'GL383522.1', 'GL383526.1', 'GL383527.1', 'GL383528.1', 'GL383531.1', 'GL383532.1', 'GL383533.1', 'GL383534.2', 'GL383539.1', 'GL383540.1', 'GL383541.1', 'GL383542.1', 'GL383545.1', 'GL383546.1', 'GL383550.2', 'GL383551.1', 'GL383552.1', 'GL383553.2', 'GL383554.1', 'GL383555.2', 'GL383556.1', 'GL383557.1', 'GL383563.3', 'GL383564.2', 'GL383565.1', 'GL383566.1', 'GL383567.1', 'GL383569.1', 'GL383570.1', 'GL383571.1', 'GL383573.1', 'GL383574.1', 'GL383575.2', 'GL383576.1', 'GL383577.2', 'GL383579.2', 'GL383580.2', 'GL383581.2', 'GL383582.2', 'GL383583.2', 'GL582966.2', 'GL877875.1', 'GL877876.1', 'GL949742.1', 'GL949746.1', 'GL949747.2', 'GL949748.2', 'GL949749.2', 'GL949750.2', 'GL949751.2', 'GL949752.1', 'GL949753.2', 'JH159136.1', 'JH159137.1', 'JH159146.1', 'JH159147.1', 'JH159148.1', 'JH636055.2', 'KB021644.2', 'KB663609.1', 'KI270442.1', 'KI270711.1', 'KI270713.1', 'KI270721.1', 'KI270726.1', 'KI270727.1', 'KI270728.1', 'KI270731.1', 'KI270733.1', 'KI270734.1', 'KI270744.1', 'KI270750.1', 'KI270758.1', 'KI270759.1', 'KI270760.1', 'KI270762.1', 'KI270763.1', 'KI270765.1', 'KI270766.1', 'KI270767.1', 'KI270768.1', 'KI270769.1', 'KI270770.1', 'KI270772.1', 'KI270774.1', 'KI270776.1', 'KI270777.1', 'KI270779.1', 'KI270780.1', 'KI270781.1', 'KI270782.1', 'KI270783.1', 'KI270784.1', 'KI270785.1', 'KI270786.1', 'KI270788.1', 'KI270789.1', 'KI270790.1', 'KI270791.1', 'KI270792.1', 'KI270793.1', 'KI270794.1', 'KI270795.1', 'KI270796.1', 'KI270797.1', 'KI270798.1', 'KI270799.1', 'KI270800.1', 'KI270801.1', 'KI270802.1', 'KI270803.1', 'KI270804.1', 'KI270805.1', 'KI270806.1', 'KI270807.1', 'KI270808.1', 'KI270809.1', 'KI270810.1', 'KI270811.1', 'KI270812.1', 'KI270813.1', 'KI270814.1', 'KI270815.1', 'KI270816.1', 'KI270817.1', 'KI270818.1', 'KI270819.1', 'KI270821.1', 'KI270822.1', 'KI270823.1', 'KI270824.1', 'KI270825.1', 'KI270829.1', 'KI270830.1', 'KI270831.1', 'KI270832.1', 'KI270833.1', 'KI270834.1', 'KI270835.1', 'KI270837.1', 'KI270838.1', 'KI270840.1', 'KI270842.1', 'KI270844.1', 'KI270845.1', 'KI270846.1', 'KI270847.1', 'KI270848.1', 'KI270849.1', 'KI270850.1', 'KI270851.1', 'KI270852.1', 'KI270853.1', 'KI270854.1', 'KI270855.1', 'KI270856.1', 'KI270857.1', 'KI270858.1', 'KI270859.1', 'KI270860.1', 'KI270861.1', 'KI270862.1', 'KI270863.1', 'KI270865.1', 'KI270866.1', 'KI270867.1', 'KI270868.1', 'KI270869.1', 'KI270870.1', 'KI270871.1', 'KI270872.1', 'KI270873.1', 'KI270874.1', 'KI270875.1', 'KI270876.1', 'KI270877.1', 'KI270878.1', 'KI270879.1', 'KI270880.1', 'KI270881.1', 'KI270882.1', 'KI270883.1', 'KI270884.1', 'KI270885.1', 'KI270886.1', 'KI270887.1', 'KI270888.1', 'KI270889.1', 'KI270890.1', 'KI270891.1', 'KI270892.1', 'KI270893.1', 'KI270894.1', 'KI270895.1', 'KI270896.1', 'KI270897.1', 'KI270898.1', 'KI270899.1', 'KI270901.1', 'KI270903.1', 'KI270904.1', 'KI270905.1', 'KI270906.1', 'KI270907.1', 'KI270908.1', 'KI270909.1', 'KI270910.1', 'KI270911.1', 'KI270912.1', 'KI270913.1', 'KI270914.1', 'KI270915.1', 'KI270916.1', 'KI270917.1', 'KI270918.1', 'KI270919.1', 'KI270920.1', 'KI270921.1', 'KI270924.1', 'KI270925.1', 'KI270926.1', 'KI270927.1', 'KI270928.1', 'KI270930.1', 'KI270931.1', 'KI270932.1', 'KI270933.1', 'KI270934.1', 'KI270935.1', 'KI270936.1', 'KI270937.1', 'KI270938.1', 'KN196472.1', 'KN196473.1', 'KN196474.1', 'KN196475.1', 'KN196476.1', 'KN196477.1', 'KN196478.1', 'KN196479.1', 'KN196480.1', 'KN196481.1', 'KN196484.1', 'KN196485.1', 'KN196486.1', 'KN196487.1', 'KN538360.1', 'KN538361.1', 'KN538362.1', 'KN538363.1', 'KN538364.1', 'KN538368.1', 'KN538369.1', 'KN538370.1', 'KN538371.1', 'KN538372.1', 'KN538373.1', 'KQ031383.1', 'KQ031384.1', 'KQ031385.1', 'KQ031387.1', 'KQ031388.1', 'KQ031389.1', 'KQ090014.1', 'KQ090015.1', 'KQ090016.1', 'KQ090018.1', 'KQ090021.1', 'KQ090022.1', 'KQ090026.1', 'KQ090027.1', 'KQ090028.1', 'KQ458382.1', 'KQ458383.1', 'KQ458384.1', 'KQ458385.1', 'KQ458386.1', 'KQ458387.1', 'KQ458388.1', 'KQ759759.1', 'KQ759760.1', 'KQ759761.1', 'KQ759762.1', 'KQ983255.1', 'KQ983256.1', 'KQ983257.1', 'KQ983258.1', 'KV575243.1', 'KV575244.1', 'KV575245.1', 'KV575246.1', 'KV575247.1', 'KV575248.1', 'KV575249.1', 'KV575250.1', 'KV575251.1', 'KV575252.1', 'KV575253.1', 'KV575254.1', 'KV575255.1', 'KV575256.1', 'KV575257.1', 'KV575258.1', 'KV575259.1', 'KV575260.1', 'KV766192.1', 'KV766193.1', 'KV766194.1', 'KV766195.1', 'KV766196.1', 'KV766198.1', 'KV880763.1', 'KV880764.1', 'KV880765.1', 'KV880766.1', 'KV880768.1', 'KZ208904.1', 'KZ208905.1', 'KZ208906.1', 'KZ208907.1', 'KZ208908.1', 'KZ208909.1', 'KZ208911.1', 'KZ208912.1', 'KZ208913.1', 'KZ208914.1', 'KZ208915.1', 'KZ208916.1', 'KZ208917.1', 'KZ208919.1', 'KZ208920.1', 'KZ208921.1', 'KZ208922.1', 'KZ208924.1', 'KZ559102.1', 'KZ559103.1', 'KZ559104.1', 'KZ559105.1', 'KZ559106.1', 'KZ559107.1', 'KZ559108.1', 'KZ559109.1', 'KZ559110.1', 'KZ559111.1', 'KZ559112.1', 'KZ559113.1', 'KZ559114.1', 'KZ559115.1', 'KZ559116.1', 'M', 'ML143341.1', 'ML143342.1', 'ML143343.1', 'ML143344.1', 'ML143345.1', 'ML143347.1', 'ML143349.1', 'ML143350.1', 'ML143352.1', 'ML143354.1', 'ML143355.1', 'ML143357.1', 'ML143358.1', 'ML143359.1', 'ML143360.1', 'ML143361.1', 'ML143362.1', 'ML143366.1', 'ML143367.1', 'ML143369.1', 'ML143370.1', 'ML143371.1', 'ML143372.1', 'ML143373.1', 'ML143374.1', 'ML143375.1', 'ML143376.1', 'ML143377.1', 'ML143378.1', 'ML143380.1', 'ML143381.1', 'X', 'Y']
INFO : Finished loading the genome annotation index... 2021-03-22 03:39:23
INFO : Processing BAM/SAM files ...2021-03-22 03:39:23
ERROR : The input file new/D0_2Aligned.sortedByCoord.out.bam has no cell barcodes information, plese make sure the aligner have add the cell barcode key, or set CB to False
Below is how my bam file looks like:
samtools view new/D0_2Aligned.sortedByCoord.out.bam |head -5
CL100132075L2C003R031_474121 272 chr1 11926 0 100M * 0 0 GTCTCTTAGCCCAGACTTCCCGTGTCCTTTCCACCGGGCCTTTGAGAGGTCACAGGGTCTTGATGCTGTGGTCTTCATCTGCAGGTGTCTGACTTCCAGC FFFCEEFDFFFFFEFF;DGFFF8GCFFABBFFFF@FFFFA:DGFGFFFFEFFEFGFFDG@FGFFFCFFFGFEFFDFFFGFFGFFFCFD=DFFFFEFFFFF NH:i:7 HI:i:2 nM:i:0 AS:i:98 CR:Z:CATACACTTG_ATTATTGTGC UR:Z:GCCGTATGTT CB:Z:CATACACTTG_ATTATTGTGC UB:Z:GCCGTATGTT
CL100132075L2C007R010_384571 272 chr1 12022 0 100M * 0 0 CAGCAACTGCTGGCCTGTGCCAGGGTGCAAGCTGAGCACTGGAGTGGAGTTTTCCTGTGGAGAGGAGCCATGCCTAGAGTGGGATGGGCCGTTGTTCATC FFFFDBFFFFFFF?FCF;FEFFFFF5FFFFFFEFFFFEF>FFFF?FFFFF8FEDEFFFFFFFFFFFFBFFFFFFFFFFFFFFF;FEFFFFFFFFF>FFFF NH:i:10 HI:i:3 nM:i:1 AS:i:96 CR:Z:CATACACTTG_ATTATTGTGC UR:Z:GCCGNATGTT
CL100132075L2C007R055_495884 272 chr1 12025 0 100M * 0 0 CAACTGCTGGCCTGTGCCAGGGTGCAAGCTGAGCACTGGAGTGGAGTTTTCCTGTGGAGAGGAGCCATGCCTAGAGTGGGATGGGCCATTGTTCATCTTC BEEAEEFGEEFCFEFEFCEEEEEEEEFEGEEEEEF<GEEBEEEEEEFAEFGGEEEEEBEEEGEFGCEEEGFEBEEEEFFEEEEFFGDGFEEEGEEBEFF< NH:i:10 HI:i:7 nM:i:0 AS:i:98 CR:Z:TGGTTAGGCA_TGGATTAGAC UR:Z:GCATCAAGAC CB:Z:TGGTTAGGCA_TGGATTAGAC UB:Z:GCATCAAGAC
CL100132075L2C001R049_91491 272 chr1 12026 0 100M * 0 0 AACTGCTGGCCTGTGCCAGGGTGCAAGCTGAGCACTGGAGTGGAGTTTTCCTGTGGAGAGGAGCCATGCCTAGAGTGGGATGGGCCATTGTTCATCTTCT 359GA@F8;DDF?FEF<ED7ECEEEEAECB@E=6;?EE@@>BEEEEEEE.@EEEBEEEADBEE?CGEEGEBEDEEEEGGEEEDFDDEEGEEGDBFFFFFF NH:i:10 HI:i:7 nM:i:0 AS:i:98 CR:Z:TGGTTAGGCA_TGGATTAGAC UR:Z:GCATCAAGAC CB:Z:TGGTTAGGCA_TGGATTAGAC UB:Z:GCATCAAGAC
CL100132075L2C002R063_343056 272 chr1 12039 0 100M * 0 0 TGCCAGGGTGCAAGCTGAGCACTGGAGTGGAGTTTTCCTGTGGAGAGGAGCCATGCCTAGAGTGGGATGGGCCATTGTTCATCTTCTGGCCCCTGTTGTC GFFDFGFFFGFGF:F2FF<EFDFFCEG6FGAFF>EAFDEGFFGF>FFEFGFFGGFGAEFGFFFEFFFGEGGEGFFFE?BFGEFFFGGEGFFGGFGBFGGF NH:i:10 HI:i:7 nM:i:0 AS:i:98 CR:Z:TGGTTAGGCA_TGGATTAGAC UR:Z:GCATCAAGTC CB:Z:TGGTTAGGCA_TGGATTAGAC UB:Z:GCATCAAGTC
I wonder if it's because of the in my CB field and tried 900 lines with removed, but the error persists. Also in the previous version it could recognise my _ linked uncorrected barcode in CR field. I'm not sure why it happens.
Much appreciated for your timely help.
How do you generate the bam file? the file seems corrupted, the 2nd line has no cell barcode (CB:Z) information, but only CR:Z
I generated the bam using STARsolo with 1MM for error correction, seen the exact line of code below. Regarding some empty CB fields, I think they are reads that cannot be matched to any barcode even allowing 1 mismatch. The bam file went smoothly with velocyto.
STAR --genomeDir /home/yoyo/yoyo/0_reference/4_gencode.v36.starindex --readFilesIn 0_raw/CL100132075_L02_read_2.fq.gz 0_raw/CL100132075_L02_read_1.fq.gz --soloCBwhitelist barcode_list.txt barcode_list.txt --runThreadN 8 --soloType CB_UMI_Complex --soloCBposition 0_0_0_9 0_16_0_25 --soloUMIposition 0_30_1_-1 --soloCBmatchWLtype 1MM --outSAMattributes NH HI nM AS CR UR CB UB --outSAMtype BAM SortedByCoordinate --readFilesCommand gunzip -c --outFileNamePrefix new/D0_2
Maybe you can do pre-filtering to remove the empty CB reads, and then run scTE. We'll check this in detail later and to see if can ignore those reads in scTE.
Thank @jphe for your generous help. I removed lines without CB from bam using samtools view new/D0_2Aligned.sortedByCoord.out.bam -h | awk '/^@/ || /CB:/' |samtools view -h -b >new/D0_2_CB_clean.bam
ran the scTE command again scTE -i new/D0_2_CB_clean.bam -o new/D0_2 -x ~/yoyo/0_reference/scTE_human.exclusive.idx -p 10 -CB CB -UMI UB --hdf5 True
and ended up with the following index out of range error. Even when I tried the command without -CB and -UMI flags, there's no h5 output file.
INFO : Loading the genome annotation index... 2021-03-22 07:18:05
DEBUG : Creating converter from 7 to 5
DEBUG : Creating converter from 5 to 7
DEBUG : Creating converter from 7 to 5
DEBUG : Creating converter from 5 to 7
INFO : Parameter list:
Sample = new/D0_2
Reference annotation index = /home/yoyo/yoyo/0_reference/scTE_human.exclusive.idx
Minimum number of genes required = 200
Minimum number of counts required = None
Number of threads = 10
INFO : Loading the genome annotation index... 2021-03-22 07:18:14
INFO : Loaded '/home/yoyo/yoyo/0_reference/scTE_human.exclusive.idx' binary file with 5589235 items
INFO : Finished loading the genome annotation index... 2021-03-22 07:18:57
INFO : Processing BAM/SAM files ...2021-03-22 07:18:57
INFO : Input SAM/BAM file appears to be valid
INFO : Done BAM/SAM files processing ...2021-03-22 07:49:29
INFO : Splitting ...2021-03-22 08:31:40
INFO : Executing multiple thread path with 10 threads
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/yoyo/miniconda2/envs/scTE/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/home/yoyo/miniconda2/envs/scTE/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/home/yoyo/miniconda2/envs/scTE/lib/python3.7/site-packages/scTE-1.0-py3.7.egg/scTE/base.py", line 366, in splitChr
CRs[t[3]] += 1
IndexError: list index out of range
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/yoyo/miniconda2/envs/scTE/bin/scTE", line 4, in <module>
__import__('pkg_resources').run_script('scTE==1.0', 'scTE')
File "/home/yoyo/miniconda2/envs/scTE/lib/python3.7/site-packages/pkg_resources/__init__.py", line 651, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/home/yoyo/miniconda2/envs/scTE/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1448, in run_script
exec(code, namespace, namespace)
File "/home/yoyo/miniconda2/envs/scTE/lib/python3.7/site-packages/scTE-1.0-py3.7.egg/EGG-INFO/scripts/scTE", line 169, in <module>
main()
File "/home/yoyo/miniconda2/envs/scTE/lib/python3.7/site-packages/scTE-1.0-py3.7.egg/EGG-INFO/scripts/scTE", line 134, in main
pool.map(partial_work, chr_list)
File "/home/yoyo/miniconda2/envs/scTE/lib/python3.7/multiprocessing/pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/yoyo/miniconda2/envs/scTE/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
IndexError: list index out of range
Update: I tried the same command without -CB and -UMI flags using the previous version (which worked before even though the barcode didn't relate to corrected ones), but it showed similar result to the updated version, I wonder if it's the index file problem (I reindexed during reinstalling the new version). Let me reindex and try if it works. Thanks
Sadly, reindexing doesn't work.
Dear scTE team,
I'm encountering a problem regarding the CR field in input bam files. In my bam files, corrected barcodes are stored in CB, while the uncorrected barcodes are in CR. A line of bam file example is pasted here:
CL100132075L2C003R068_87600 256 chr1 14041 0 100M * 0 0 TTGGTGCCAGTTCCTCCAAGTCGATGGCACCTCCCTCCCTCTCAACCACTTGAGCAAACTCCAAGACATCTTCTACCCCAACACCAGCAATTGTGCCAAG D.FBC>C@DAFFADFEBFFFBEE;;FD0F@DFFFCFFFDFFFF9=DFFF8EE,FFD>CDFFF<FBDFD:EFEEFE<=ED/7FDFFE;F)/BEFDC1EA-C NH:i:9 HI:i:4 nM:i:0 AS:i:98 CR:Z:GTGACTGTAC_TACTAATGGC UR:Z:GGGCCATCAG CB:Z:GTGACTGTAC_TACTAATGGC UB:Z:GGGCCATCAG
As a result, in the output h5ad, there are way more observations than expected, but the upper limit is 10k. So I don't think it captures all lines from bam.
I tried to change my bam with the following code but it reported truncated file error.
samtools view -h new/D0_2Aligned.sortedByCoord.out.bam | awk '{$16=$17=""; print $0}'|sed 's/UB:/UR:/g' |sed 's/CB:/CR:/g' |samtools view -h -b > D0_2_CB.bam
Does scTE come with the function to check the CB field instead of CR field? Thanks.