migoox / genome-downsampler

Other
2 stars 0 forks source link

Add flag to filter out non single amplicon pairs + bam api refactor #71

Closed mytkom closed 5 months ago

mytkom commented 6 months ago

Research phase:

  1. There is a function which parses one line of .bed file: https://github.com/samtools/htslib/blob/30c9c50a874059e3dae7ff8c0ad9e8a9258031c8/htslib/regidx.h#L121
  2. I need to take care of pairs.tsv file too (it specifies which bed boundaries should be taken into account)

CLI interface: --bed <filepath> if specified it filters out or scores pairs of sequences by inclusion to single amplicon from file. --tsv <filepath> if specified its pairs of primers are being used, if not specified pairs are taken by order from .bed (ex. first primer with second primer, third with fourth etc.)

Changes in bam-api:

  1. I need to add flag to read_bam which each algorithm would need to specify. This flag will determine if algorithm should grade by single amplicon inclusion or rather filter out non single apliconed pairs of sequences.
  2. I need to implement filtering out or grading depending on flag value.