Open nouroddinc opened 1 year ago
@mourisl Is this way of using read-format supported?
@nouroddinc Is your barcode starting from 60-67 followed by 22-29? Chromap will internally sort it to 22-29 followed by 60-67 to avoid typos in ranges. Maybe this is an undesirable feature now.
@haowenz @mourisl thanks for helping me on this. yes, my barcode starts from 60-67 and ends with 22-29. I have tested with --read-format bc:22:29,bc:60:67,r1:0:-1,r2:117:-1
and the output log file and bed file is exactly the same as when I tested --read-format bc:60:67,bc:22:29,r1:0:-1,r2:117:-1
. I should also mention that my barcode file bc50.txt includes both barcodes bc:22:29+60:67 and bc:60:67+22:29 in some cases. For example
# cat bc50.txt | head
AACGTGATAACGTGAT
AACGTGATGCGAGTAA
AACGTGATATGCCTAA
AACGTGATGCTAACGA
AACGTGATACCACTGT
AACGTGATACATTGGC
AACGTGATCAGATCTG
AACGTGATCATCAAGT
AACGTGATCGCTGATC
AACGTGATACAAGCTA
I have both combinations of barcodes AACGTGAT and AACGTGAT which is AACGTGATATGCCTAA
and ATGCCTAAAACGTGAT
but I only have AACGTGATGCGAGTAA
and Not GCGAGTAAAACGTGAT
@nouroddinc I have removed the sort in the branch "li_dev4". Could you please checkout this branch, "make clean; make" to recompile chromap, and give the new version a try?
@mourisl I have tested the new branch and the output still shows the low number of mapped reads! chromap_with_read_format_new_branch.log
I rechecked your running log. For the "_without_read_format" run, you used the files
1th read 1 file: cellranger_inputs/6bp_D01291_NG02620_S1_L001_R1_001.fastq.gz
1th read 2 file: cellranger_inputs/6bp_D01291_NG02620_S1_L001_R3_001.fastq
1th cell barcode file: cellranger_inputs/6bp_D01291_NG02620_S1_L001_R2_001.fastq
For the files with "_with_read_format", the files are:
1th read 1 file: cellranger_inputs/6bp_D01291_NG02620_S1_L001_R1_001.fastq.gz
1th read 2 file: 6bp_D01291_NG02620_linker2_R2.fastq.gz
1th cell barcode file: 6bp_D01291_NG02620_linker2_R2.fastq.gz
Since some of the files are in cellranger_inputs, and some are not, I just want to make sure you used the right input files.
@mourisl I think you might be right. I tested with a new data set with smaller size to get faster output and sounds like they are very similar but still not identical. Is that normal?
The "Number of barcodes in whitelist" is still different "580224" vs "539675". So I guess the manually extracted barcode and the automatic extracted barcode are still different. Could you please try:
Thank you for helping us debug the issue.
@mourisl is this what you asked for?
1- I have tried reverse barcode like bc:60:67:-,bc:22:29:-,r1:0:-1,r2:117:-1
and it gives Less than 5% barcodes can be found or corrected based on the barcode whitelist
error which makes sense to me as the log file shows chromap_with_read_format_rev.log
2- I have extracted CB:Z tag barcodes from both sam files and they match the whitelist. I mean I can't see any wrong barcode in neither of them.
Thank you for the testing.
alright, I have two outputs.
1) with the normal barcode (bc:60:67,bc:22:29) chromap_with_read_format.log
sam files output in normal barcode is like this:
# samtools view -h aln.sam | sed -n '67,74 p'
lh00134:68:225MHKLT3:7:1281:23172:18977 99 chr1 3009659 60 115M = 3009741 115 TAGAGTGACTGTTAAGAAGTTATGTGAAATTCTAGGAAGTCTGCCTTCATATGTTACTTTGCCTTTTCCCATTTCAGCTTTTAATATTCTTTTTTGATGTTGTTGTTCTATAAGT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF5FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:115 CB:Z:ACTATGCACACTTCGA
lh00134:68:225MHKLT3:7:1281:23172:18977 147 chr1 3009741 60 33M = 3009659 -115 AATATTCTTTTTTGATGTTGTTGTTCTATAAGT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:33 CB:Z:ACTATGCACACTTCGA
lh00134:68:225MHKLT3:7:2242:32307:22630 163 chr1 3045894 60 33M = 3046071 327 TAGTATAACTGTGTAGTGAAAGCTATTTGCTCA FFFFFFFFFFFFFFFFFF5FFFFFFFFFFFFFF NM:i:0 MD:Z:33 CB:Z:GAATCTGAAAGGTACA
lh00134:68:225MHKLT3:7:2242:32307:22630 83 chr1 3046071 60 150M = 3045894 -327 AAGTGCGCACATAGGATTTGTAGTATCTGCAATTTTGAGCACCTCATGGTATTCATGGAACATGTTCCAATGAACCATATGGATTTGTCATGTCTTAACTGATCAAAAGATTGGCCTTCCATTGTAGGGCCAAGTAACCTTATGCTAAGG FFFFF5FFFFFFFFF5FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:3 MD:Z:5G20A86A36 CB:Z:GAATCTGAAAGGTACA
lh00134:68:225MHKLT3:7:1162:45492:5711 163 chr1 3062726 46 33M = 3062726 106 GTATGGAAGGAAACGGCGAGAGAAATAATGGAG FFFFFFFFFFFFFFFFFFFF5FFFFFFFFFFFF NM:i:0 MD:Z:33 CB:Z:ACACAGAAAGTACAAG
lh00134:68:225MHKLT3:7:1162:45492:5711 83 chr1 3062726 46 106M = 3062726 -106 GTATGGAAGGAAACGGCGAGAGAAATAATGGAGGCCAAGACATGTTTCTGTTCAAGGCCCCCAAGTTTACTAAGAGTCTGTGCTTATAAAGTGGGGAGGCCCATCC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:1 MD:Z:103C2 CB:Z:ACACAGAAAGTACAAG
lh00134:68:225MHKLT3:7:2149:19696:3243 99 chr1 3094994 60 73M = 3095034 73 GCTCAATGAACTTGGCTTTGGGAACTGCTCCTGTGGTAACTAGCAGTCACATCTGTGTGTACCAATCTGCATA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:73 CB:Z:AAGGTACAGAGCTGAA
lh00134:68:225MHKLT3:7:2149:19696:3243 147 chr1 3095034 60 33M = 3094994 -73 TAGCAGTCACATCTGTGTGTACCAATCTGCATA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:33 CB:Z:AAGGTACAGAGCTGAA
2) the reverse barcode (bc:22:29,bc:60:67) log file:
chromap_with_read_format_rev.log
and reverse barcode sam file is like :
# samtools view -h aln_rev.sam | sed -n '67,74 p'
lh00134:68:225MHKLT3:7:1281:23172:18977 99 chr1 3009659 60 115M = 3009741 115 TAGAGTGACTGTTAAGAAGTTATGTGAAATTCTAGGAAGTCTGCCTTCATATGTTACTTTGCCTTTTCCCATTTCAGCTTTTAATATTCTTTTTTGATGTTGTTGTTCTATAAGT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF5FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:115 CB:Z:CACTTCGAACTATGCA
lh00134:68:225MHKLT3:7:1281:23172:18977 147 chr1 3009741 60 33M = 3009659 -115 AATATTCTTTTTTGATGTTGTTGTTCTATAAGT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:33 CB:Z:CACTTCGAACTATGCA
lh00134:68:225MHKLT3:7:2242:32307:22630 163 chr1 3045894 60 33M = 3046071 327 TAGTATAACTGTGTAGTGAAAGCTATTTGCTCA FFFFFFFFFFFFFFFFFF5FFFFFFFFFFFFFF NM:i:0 MD:Z:33 CB:Z:AAGGTACAGAATCTGA
lh00134:68:225MHKLT3:7:2242:32307:22630 83 chr1 3046071 60 150M = 3045894 -327 AAGTGCGCACATAGGATTTGTAGTATCTGCAATTTTGAGCACCTCATGGTATTCATGGAACATGTTCCAATGAACCATATGGATTTGTCATGTCTTAACTGATCAAAAGATTGGCCTTCCATTGTAGGGCCAAGTAACCTTATGCTAAGG FFFFF5FFFFFFFFF5FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:3 MD:Z:5G20A86A36 CB:Z:AAGGTACAGAATCTGA
lh00134:68:225MHKLT3:7:1162:45492:5711 163 chr1 3062726 46 33M = 3062726 106 GTATGGAAGGAAACGGCGAGAGAAATAATGGAG FFFFFFFFFFFFFFFFFFFF5FFFFFFFFFFFF NM:i:0 MD:Z:33 CB:Z:AGTACAAGACACAGAA
lh00134:68:225MHKLT3:7:1162:45492:5711 83 chr1 3062726 46 106M = 3062726 -106 GTATGGAAGGAAACGGCGAGAGAAATAATGGAGGCCAAGACATGTTTCTGTTCAAGGCCCCCAAGTTTACTAAGAGTCTGTGCTTATAAAGTGGGGAGGCCCATCC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:1 MD:Z:103C2 CB:Z:AGTACAAGACACAGAA
lh00134:68:225MHKLT3:7:2149:19696:3243 99 chr1 3094994 60 73M = 3095034 73 GCTCAATGAACTTGGCTTTGGGAACTGCTCCTGTGGTAACTAGCAGTCACATCTGTGTGTACCAATCTGCATA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:73 CB:Z:GAGCTGAAAAGGTACA
lh00134:68:225MHKLT3:7:2149:19696:3243 147 chr1 3095034 60 33M = 3094994 -73 TAGCAGTCACATCTGTGTGTACCAATCTGCATA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:33 CB:Z:GAGCTGAAAAGGTACA
and the read2 file is this:
# zcat read_R2.fastq.gz | grep -A 3 -F -f list_of_read_ids.txt | awk 'BEGIN{RS="--"; ORS=""} {print $0}'
@lh00134:68:225MHKLT3:7:2149:19696:3243 2:N:0:TNAGGC
CACGCATTTGCTTCTCGCATCTGAGCTGAAATCCACGTGCTTGAGAGGCCAGAGCATTCGAAGGTACAGTGGCCGATGTTTCGCATCGGCGTACGACTAGATGTGTATAAGAGACAGTATGCAGATTGGTACACACAGATGTGACTGCTA
+
F5F-FF-FF-FFFF5FFFFFFFFFFFFFFFFFFFFFFFFF-FFFFFFFFF-FFFFFFFFFFFFFFFFFFFFFFFFF5FFFFFFFFFFFFFFFFF5FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@lh00134:68:225MHKLT3:7:1281:23172:18977 2:N:0:TNAGGC
CAAGCTTTGGCTTCTCGCATCTCACTTCGAATCCACGTGCTCGATAGGCCAGAGCATTCGACTATGCAGTGGCCGATGTTTCGCATCGGCGTACGACTAGATGTGTATAAGAGACAGACTTATAGAACAACAACATCAAAAAAGAATATT
+
F55FFF5FFFFFFFF55555555555-555555555555555FFFFFFFFFFFFFFF-F-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@lh00134:68:225MHKLT3:7:1162:45492:5711 2:N:0:TNAGGC
CACTCGTTGCCCTCTCGCATCTAGTACAAGATCCACGTGCTTGATAGGCCAGAGCATTCGACACAGAAGTGGCCGATGTTTCGCATCGGCGTACGACTAGATGTGTATAAGAGACAGGTATGGAAGGAAACGGCGAGAGAAATAATGGAG
+
F5-FF-5FFFF-F-FF-5FFF5555555555-555-55-555555555FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF55FFFFFFFFFFFFFFFFFFFFF5FFFFFFFFFFFF
@lh00134:68:225MHKLT3:7:2242:32307:22630 2:N:0:TNAGGC
CAAGCGTTGGCTTCTCGCATCTAAGGTACAATCCACGTGCTTGAGAGGCCAGAGCATTCGGAATCTGAGTGGCCGATGTTTCGCATCGGCGTACGACTAGATGTGTATAAGAGACAGTAGTATAACTGTGTAGTGAAAGCTATTTGCTCA
+
-FFF-FFFF555-FFFFF5FFFF5555555FFF-FFFFFFFFFFFFFF-FFFFFFFFFFFFFFFFFFFFFFFFFF5FFFFFFFFFFF5FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF5FFFFFFFFFFFFFF
hopefully, this is what you were looking for. Thanks
Seems the barcode order issue is indeed fixed. Are the barcodes from readformat run the same as your manually-created barcode file (without readformat) based on the SAM file?
it sounds like NOT; they are in reverse of each other!
chromap --preset atac -x GRCm38_genome.index -r GRCm38_genome.fa -1 read_R1.fastq.gz -2 read_R2.fastq.gz --SAM -o aln.sam -b read_R2.fastq.gz --read-format bc:60:67,bc:22:29,r1:0:-1,r2:117:-1 -t 10 --barcode-whitelist bc50.txt 2> chromap_with_read_format.log
the out put is
# samtools view -h aln.sam | sed -n '67,74 p'
lh00134:68:225MHKLT3:7:1281:23172:18977 99 chr1 3009659 60 115M = 3009741 115 TAGAGTGACTGTTAAGAAGTTATGTGAAATTCTAGGAAGTCTGCCTTCATATGTTACTTTGCCTTTTCCCATTTCAGCTTTTAATATTCTTTTTTGATGTTGTTGTTCTATAAGT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF5FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:115 CB:Z:ACTATGCACACTTCGA
lh00134:68:225MHKLT3:7:1281:23172:18977 147 chr1 3009741 60 33M = 3009659 -115 AATATTCTTTTTTGATGTTGTTGTTCTATAAGT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:33 CB:Z:ACTATGCACACTTCGA
lh00134:68:225MHKLT3:7:2242:32307:22630 163 chr1 3045894 60 33M = 3046071 327 TAGTATAACTGTGTAGTGAAAGCTATTTGCTCA FFFFFFFFFFFFFFFFFF5FFFFFFFFFFFFFF NM:i:0 MD:Z:33 CB:Z:GAATCTGAAAGGTACA
lh00134:68:225MHKLT3:7:2242:32307:22630 83 chr1 3046071 60 150M = 3045894 -327 AAGTGCGCACATAGGATTTGTAGTATCTGCAATTTTGAGCACCTCATGGTATTCATGGAACATGTTCCAATGAACCATATGGATTTGTCATGTCTTAACTGATCAAAAGATTGGCCTTCCATTGTAGGGCCAAGTAACCTTATGCTAAGG FFFFF5FFFFFFFFF5FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:3 MD:Z:5G20A86A36 CB:Z:GAATCTGAAAGGTACA
lh00134:68:225MHKLT3:7:1162:45492:5711 163 chr1 3062726 46 33M = 3062726 106 GTATGGAAGGAAACGGCGAGAGAAATAATGGAG FFFFFFFFFFFFFFFFFFFF5FFFFFFFFFFFF NM:i:0 MD:Z:33 CB:Z:ACACAGAAAGTACAAG
lh00134:68:225MHKLT3:7:1162:45492:5711 83 chr1 3062726 46 106M = 3062726 -106 GTATGGAAGGAAACGGCGAGAGAAATAATGGAGGCCAAGACATGTTTCTGTTCAAGGCCCCCAAGTTTACTAAGAGTCTGTGCTTATAAAGTGGGGAGGCCCATCC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:1 MD:Z:103C2 CB:Z:ACACAGAAAGTACAAG
lh00134:68:225MHKLT3:7:2149:19696:3243 99 chr1 3094994 60 73M = 3095034 73 GCTCAATGAACTTGGCTTTGGGAACTGCTCCTGTGGTAACTAGCAGTCACATCTGTGTGTACCAATCTGCATA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:73 CB:Z:AAGGTACAGAGCTGAA
lh00134:68:225MHKLT3:7:2149:19696:3243 147 chr1 3095034 60 33M = 3094994 -73 TAGCAGTCACATCTGTGTGTACCAATCTGCATA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:33 CB:Z:AAGGTACAGAGCTGAA
and without --read-format
chromap --preset atac -x GRCm38_genome.index -r GRCm38_genome.fa -1 read_R1.fastq.gz -2 read_R2.fq --SAM -o aln_wo.sam -b read_R3.fq -t 10 --barcode-whitelist bc50.txt 2> chromap_without_read_format.log
# samtools view -h aln_wo.sam | sed -n '67,74 p'
lh00134:68:225MHKLT3:7:1281:23172:18977 99 chr1 3009659 60 115M = 3009741 115 TAGAGTGACTGTTAAGAAGTTATGTGAAATTCTAGGAAGTCTGCCTTCATATGTTACTTTGCCTTTTCCCATTTCAGCTTTTAATATTCTTTTTTGATGTTGTTGTTCTATAAGT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF5FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:115 CB:Z:CACTTCGAACTATGCA
lh00134:68:225MHKLT3:7:1281:23172:18977 147 chr1 3009741 60 33M = 3009659 -115 AATATTCTTTTTTGATGTTGTTGTTCTATAAGT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:33 CB:Z:CACTTCGAACTATGCA
lh00134:68:225MHKLT3:7:2242:32307:22630 163 chr1 3045894 60 33M = 3046071 327 TAGTATAACTGTGTAGTGAAAGCTATTTGCTCA FFFFFFFFFFFFFFFFFF5FFFFFFFFFFFFFF NM:i:0 MD:Z:33 CB:Z:AAGGTACAGAATCTGA
lh00134:68:225MHKLT3:7:2242:32307:22630 83 chr1 3046071 60 150M = 3045894 -327 AAGTGCGCACATAGGATTTGTAGTATCTGCAATTTTGAGCACCTCATGGTATTCATGGAACATGTTCCAATGAACCATATGGATTTGTCATGTCTTAACTGATCAAAAGATTGGCCTTCCATTGTAGGGCCAAGTAACCTTATGCTAAGG FFFFF5FFFFFFFFF5FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:3 MD:Z:5G20A86A36 CB:Z:AAGGTACAGAATCTGA
lh00134:68:225MHKLT3:7:1162:45492:5711 163 chr1 3062726 46 33M = 3062726 106 GTATGGAAGGAAACGGCGAGAGAAATAATGGAG FFFFFFFFFFFFFFFFFFFF5FFFFFFFFFFFF NM:i:0 MD:Z:33 CB:Z:AGTACAAGACACAGAA
lh00134:68:225MHKLT3:7:1162:45492:5711 83 chr1 3062726 46 106M = 3062726 -106 GTATGGAAGGAAACGGCGAGAGAAATAATGGAGGCCAAGACATGTTTCTGTTCAAGGCCCCCAAGTTTACTAAGAGTCTGTGCTTATAAAGTGGGGAGGCCCATCC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:1 MD:Z:103C2 CB:Z:AGTACAAGACACAGAA
lh00134:68:225MHKLT3:7:2149:19696:3243 99 chr1 3094994 60 73M = 3095034 73 GCTCAATGAACTTGGCTTTGGGAACTGCTCCTGTGGTAACTAGCAGTCACATCTGTGTGTACCAATCTGCATA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:73 CB:Z:GAGCTGAAAAGGTACA
lh00134:68:225MHKLT3:7:2149:19696:3243 147 chr1 3095034 60 33M = 3094994 -73 TAGCAGTCACATCTGTGTGTACCAATCTGCATA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:33 CB:Z:GAGCTGAAAAGGTACA
this means that reverse barcode that i already tested is aligned with the one with out --read-format
Could you please be more specific about whether the "without --read-format" run had mistakes due to the input order being wrong, or the "with --read-format" run generated wrong barcodes? Thank you.
@mourisl I think it works now pretty well with my manual barcode extraction pipeline. If I use --read-format bc:22:29,bc:60:67,r1:0:-1,r2:117:-1
which we called reverse barcode in the above that matches exactly with the manual barcode extraction one. You are good to close this. Thank you so much for your help!
Hi
chromap
team, Thank you so much for preparing this helpful tools and I'm very impressed by its performances. I am testing it on my scATAC-seq data and it works pretty good when I don't use--read-format
.I have a scATAC-seq with R1 and R2 to be mapped and cell barcodes in R2 ( at positions bc1:22:29 and bc2:60-67)
Looking at my R2 file:
using the followoing command:
gives me very low number of the mapped reads as the chromap_with_read_format.log shows.
but if I extract barcodes from read_R2.fq.gz into read_R3.fq and leave the genome in read_R2.fq as a new file like the below:
and
and then run like this:
it works very good as log file chromap_without_read_format.log shows. my
chromap
version isand this is my barcode whitelist file:
I would appreciate if you could help me to find what i am doing wrong because using --read-format argument can make it much faster and easier to run my pipeline.
many Thanks, Noori