Open dawe opened 3 years ago
If this option will be ever be implemented, it may be useful to allow output in .mtx format when analysing single cell data
We actually have this implemented but not made available to users for now. We may include this feature in the future release. For now you might be able to get this by running bedtools on Chromap bed output and interval bed files.
Just checking… can you disclose which version of Chromap will enable this feature? Alternatively, can you hint how to expose it myself and give it a try?
Hi,
I checked the code and found out that currently Chromap can only generate cell-by-bin matrix with fixed bin sizes instead of taking a BED file that contains the intervals to generate the matrix. It would require some time and efforts to implement and test that feature. As I have limited bandwidth at the moment, perhaps we will not include this feature in a recent release. So far you may try to see if there are other scripts or command lines that can help you generate the matrix using the output fragment file of Chromap.
My BED files are mostly fixed size bins, so I guess it will do the job! Thanks, I’ll spend some time in the following days to test it properly
Hello, I tried to expose this feature by uncommenting lines https://github.com/haowenz/chromap/blob/567fc3ed407dc577518b6ec779222d778f9eaf35/src/chromap.cc#L5941 to https://github.com/haowenz/chromap/blob/567fc3ed407dc577518b6ec779222d778f9eaf35/src/chromap.cc#L5950 but my output doesn’t seem to be binned, I have fragments resembling a ATAC output. I will still try to play with options, just wanted to be sure the expected output will be in the form of
chrom 0 5000 cell count
chrom 5000 10000 cell count
…
Followup, I have a scATAC-seq with R1 and R3 to be mapped and cell barcodes in R2 (starting at position 9). The output of this
chromap -x GRCm38.index -r GRCm38.fa.gz --low-mem -t 4 -1 READ1.fq.gz -2 READ3.fq.gz --barcode READ2.fq.gz --barcode-whitelist whitelist.tsv --read-format bc:9:-1,r1:0:-1,r2:0:-1 -o output_peaks
and
chromap -x GRCm38.index -r GRCm38.fa.gz --low-mem -t 4 -1 READ1.fq.gz -2 READ3.fq.gz --barcode READ2.fq.gz --barcode-whitelist whitelist.tsv --read-format bc:9:-1,r1:0:-1,r2:0:-1 -o output_matrix --cell-by-bin --bin-size 5000
is the same:
$ md5sum output_atac output_matrix
a4265e26970a8d3b249ac82ccf7e5ce9 output_atac
a4265e26970a8d3b249ac82ccf7e5ce9 output_matrix
as if the --cell-by-bin
--bin-size 5000
flags are not taken into account
Thanks for trying. You also have to uncomment these two lines: https://github.com/haowenz/chromap/blob/567fc3ed407dc577518b6ec779222d778f9eaf35/src/chromap.cc#L5968-L5969
And set this parameter to a name prefix you desire, there will be several files generated with that name prefix. But overall this feature is not tested after we made changes to lots of code. So I am no sure if it will still work.
Thanks, I missed those lines. Anyhow, you were right, it doesn't work in the sense that
chromap -x GRCm38.index -r GRCm38.fa.gz \\
--low-mem -t 4 -1 READ1.fq.gz -2 READ3.fq.gz --barcode READ2.fq.gz \\
--barcode-whitelist whitelist.tsv --read-format bc:9:-1,r1:0:-1,r2:0:-1 \\
-o FOUT --cell-by-bin --bin-size 5000 --matrix-output-prefix FMAT
only returns FOUT
but no FMAT*
Hi all, I'm testing
chromap
and I'd say I'm impressed by the performances. In addition to the current options, would you consider the possibility to output counts over given intervals? Catalogs of epigenetic features are available and it would be interesting to quantify known annotations.