walaj / SeqLib

C++ htslib/bwa-mem/fermi interface for interrogating sequence data
http://bioinformatics.oxfordjournals.org/content/early/2016/12/21/bioinformatics.btw741.full.pdf+html
Other
133 stars 36 forks source link

Try to infer b37 style contig names when constructing GenomicRegion from strings #50

Closed julianhess closed 5 years ago

julianhess commented 5 years ago

When creating a GenomicRegion from strings contig/start/end, see if the primary contig names can can be easily mapped from b37 style -> hgXX style, i.e. [0-9XY]+ -> chr[0-9XY]+

Note that this will only work if using ≥C++11, since it depends on STL regex functionality introduced in C++11. I've thus added the appropriate language scope guards.

julianhess commented 5 years ago

Oops, this was way too slow to process any headers of substantial length, since regex would get recompiled every time a GenomicRegion was instantiated.

Please see https://github.com/walaj/SeqLib/pull/51 instead