Closed y9c closed 1 year ago
Thank you!!
Hi @sstadick,
I have a quick question about this command.
// Create a par_granges runner
let par_granges_runner = par_granges::ParGranges::new(
PathBuf::from("test/sample1.bam"), // pass in bam
Some(PathBuf::from("test/reference.fa")), // optional ref fasta
Some(PathBuf::from("test/test.bed")), // bedfile to narrow regions
None, // optional bcf/vcf file specifying positions of interest
true, // merge any overlapping intervals in the BED file
None, // optional allowed number of threads, defaults to max
None, // optional chunksize modification
None, // optional channel size modification
basic_processor,
);
I add reference.fa
into the ParGranges object, but in the output file ref_base
is still None
.
PileupPosition { ref_seq: "XII", pos: 729187, ref_base: None, depth: 276, a: 268, c: 0, g: 0, t: 0, n: 0, ins: 0, del: 8, ref_skip: 0, fail: 0, near_max_depth: false }
How can I fix this?
Thank you.
Is your reference file indexed? And do the chromosome names between the fasta file, bed file, and BAM header match? They may all have to be sorted the same as well - it’s been a while since I dug into that code.
Sent from my iPhone
On Sep 29, 2022, at 5:54 PM, Chang Y @.***> wrote:
Hi @sstadick,
I have a quick question about this command.
// Create a par_granges runner let par_granges_runner = par_granges::ParGranges::new( PathBuf::from("test/sample1.bam"), // pass in bam Some(PathBuf::from("test/reference.fa")), // optional ref fasta Some(PathBuf::from("test/test.bed")), // bedfile to narrow regions None, // optional bcf/vcf file specifying positions of interest true, // merge any overlapping intervals in the BED file None, // optional allowed number of threads, defaults to max None, // optional chunksize modification None, // optional channel size modification basic_processor, );
I add reference.fa into the ParGranges object, but in the output file ref_base is still None.
PileupPosition { ref_seq: "XII", pos: 729187, ref_base: None, depth: 276, a: 268, c: 0, g: 0, t: 0, n: 0, ins: 0, del: 8, ref_skip: 0, fail: 0, near_max_depth: false }
How can I fix this?
Thank you.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.
Actually- I think the example is just a bit incomplete in covering how to set up a processor to do what you want. Take a look at the BaseProcessor in sec/commands/base_depth.rs (apologies for no link, I'm on my phone).
Like I said though it's been a bit since I've really gotten into this code so I might be off base. I'll take a look tomorrow when I have a computer.
@sstadick , Thank you for you help.
Yes. Both bam and fasta are indexed and sorted in the same order. I tried to change the Some(PathBuf::from("test/reference.fa"))
into Some(PathBuf::from("something_not_exist"))
, and the code and execute without error. I think par_granges object might not pares fasta input when the data is in bam format.
Hi, @sstadick . I make reference base output work by passing fasta input in the reads buffer.
However the speed of this pileup function is quite slow. I saw you pushed an pileup function into noodles library. https://github.com/zaeleus/noodles/pull/14 . Will it be better in speed? Or have you found better solution to improve this? Thanks.
@y9c, sorry for the slow reply, I'm glad you got something working! If you are able to share your implementation I would be happy to take a look for low-hanging efficiency fixes.
The short answer is that the PR I have open isn't finished, and may not be faster because it would necessitate a complete migration to Noodles instead of htslib. I haven't revisited the idea in a while but in theory having a custom pileup engine should remove a few copies / redundancies compared to using htslib itself. I don't think it will be a massive speedup though.
All the other methods for depth / coverage calculation in perbase
are fast specifically because they use tricks to avoid looking at the actual bases. As soon as you have to look at each base it's going to be orders of magnitude slower. If perbase
, or the lib, is slower than samtools
or any other library I would consider that a bug.
bed and vcf order is not correct