jackh726 / bigtools

A high-performance BigWig and BigBed library in Rust
MIT License
70 stars 7 forks source link

Select position range in BigWig #61

Open multimeric opened 2 days ago

multimeric commented 2 days ago

I'm hoping to "cut down" a BigWig by only selecting a number of chromosomes. It would be nice if there were a bigtools bigSelect --chroms chr1,chr2 that did so, and/or bigtools bigSelect --bed regions.bed for more advanced use cases.

Of course this can be done using the Rust/Python API, but I'm after a more user friendly solution I can suggest to others who want to do this.

jackh726 commented 1 day ago

Yeah, this certainly would be helpful! There are the chrom/start/end options available in bigwigtobedgraph, but that isn't quite as powerful as what you propose. There is also bigtools intersect and bigtools chromintersect that are most undocumented and not in good shape, that I previously did a little bit in this area. (But they are far from what you want!)

It's pretty trivial to have a tool that does a read -> filter -> write, but better would be able to efficiently copy over entire blocks of the file and just reindex as needed. For chromosomes this is really easy, but for specific regions its a bit more difficult, since you have to think about if you want to just copy over unfiltered blocks at the expense of some blocks being smaller or if you want to maintain the fact that most all blocks are full.

I'm not sure that I'll get to this very quickly, but when I find some time, I'd be happy to take a stab at this.

multimeric commented 1 day ago

If you can specifically optimise selecting chromosomes, then I think it's worth making a separate subcommand for that. The BED file selection is more complex and not actually as important for my use case.