open2c / pairtools

Extract 3D contacts (.pairs) from sequencing alignments
MIT License
99 stars 32 forks source link

Wording in pairtools parse2 for --max-insert-size #232

Open bskubi opened 6 months ago

bskubi commented 6 months ago

For the --max-insert-size parameter of pairtools parse2, the documentation says:

When searching for overlapping ends of left and right read (R1 and R2), this sets the minimal distance when two alignments on the same strand and chromosome are considered part of the same fragment (and thus reported as the same alignment and not a pair).

I'm not sure I understand this correctly. It sounds like the idea is that it's dealing with the situation where we have a read pair where there's two alignments, one on each read pair, that map close to each other. This filter is deciding when to assume these nearby mappings originate from a single molecule of DNA (which may have been ligated into a chimeric molecule) or whether to assume the two alignments represent a chimeric junction.

The wording here is that it sets the 'minimal' distance, but it's also worded as 'max'-insert-size. I would tend to assume that we'd have an upper, not lower bound, on the distance between two alignments to consider them part of the same fragment. So I'm wondering if 'minimal' here is a typo? Sorry if this is a misunderstanding on my part! Thanks for any insight you can provide.