Closed ffranr closed 6 years ago
To clarify, for point 1 above, gramtools already allows reading BAM/CRAM as htslib transparently allows it
Question: Do you know who knows how to use the htslib API to get the reads from a bam which are in a certain range?
Answer (James Bonfield):
The best starting point is probably htslib's test/test_view.c, which has this code in it (paraphrased):
bam1_t *b = bam_init1(); hts_idx_t *idx; if ((idx = sam_index_load(in, argv[optind])) == 0) { fprintf(stderr, "[E::%s] fail to load the BAM index\n", __func__); return 1; } hts_itr_t *iter; if ((iter = sam_itr_querys(idx, h, argv[i])) == 0) { fprintf(stderr, "[E::%s] fail to parse region '%s'\n", __func__, argv[i]); continue; } while ((r = sam_itr_next(in, iter, b)) >= 0) { if (!benchmark && sam_write1(out, h, b) < 0) { fprintf(stderr, "Error writing output.\n"); exit_code = 1; break; } if (nreads && --nreads == 0) break; } hts_itr_destroy(iter); if (r < -1) { fprintf(stderr, "Error parsing input.\n"); exit_code = 1; } bam_destroy1(b); bam_hdr_destroy(h);
So basically create an iterator, call it repeatedly until it returns -1 (EOF) or lower (error), and then destroy the iterator at the end.
The code in it's current state calls this function to input a particular read from a SAM/BAM file: https://github.com/iqbal-lab-org/gramtools/blob/9dd497a01e8efd6422dfc851670f63502817358a/libgramtools/include/sequence_read/seq_file.h#L199
The above function uses functions within HTSlib to extract sequence, quality values, etc. However, the higher level function sam_itr_next
is not used. I think the code was written in this way so that the read can be parsed in reverse.
Basically, there's a lot of custom code and the higher level HTSlib functions are not used. This issue doesn't seem to be as trivial as passing in additional variables.
Don't worry, I can work around it. Just would have been handy if it was an easy fix.
@iqbal-lab @martinghunt I don't think that addressing this issue is part of the current plan. Closing the issue.
Ideally these options to quasimap:
Note for 2: include reads that intersect the interval [start,end]. eg if read is mapped from 50-101 and we want to pull out reads from 100-200, then include that read.