zjshi / gt-pro

MIT License
23 stars 7 forks source link

support paired-end reads? #55

Closed nick-youngblut closed 1 year ago

nick-youngblut commented 1 year ago

I do not see anything in the docs (GT_Pro optimize help docs, the repo README, or Shi et al., 2023) on handling paired-end reads.

How should one handle paired-end reads with GT_Pro optimize:

@bsmith89 did you use paired-end reads for genotyping with GT-Pro in order to generate the input for StrainFacts? I don't seen anything in the StrainFacts README about handling paired-end reads. I also can't find info in Smith et al., 2022 about paired-end reads (e.g., whether paired-end were used in the study).

nick-youngblut commented 1 year ago

I now see where reverse reads are mentioned in the Maast + GT-Pro protocol:

In practice, users would genotype forward and reverse reads, if both are available. This can be done by supplying both forward and reverse reads as either two individual input files or a single concatenated file. The command does not need to be modified otherwise.

bsmith89 commented 1 year ago

Yeah, just to confirm the details, since GT-Pro uses direct kmer matching, the pairing of reads doesn't contain any additional information beyond the deeper sequencing it provides. There is a bit of subtlety with statistical non-independence of the pair, and GT-Pro ignores this. But that effect is probably relatively small and I think it's pretty reasonable to treat it as "just more sequence" to count kmers in.