Thoughts on combining ConQuR with TPM-normalisation

Hi,

Your tool looks interesting and I look forward to trying it out. I'm just wondering whether intuitively it makes sense to use any normalisation prior to running ConQuR.

Our data include counts for different viral taxa after using capture-based sequencing. The (many) samples have been sequenced over several batches over several years. Given the nature of these counts, I also expect there to be potential impacts of viral genome size on the resulting counts: a larger virus might be expected to have a higher total count simply by virtue of its genome size resulting in a greater number of Illumina-sequenced fragments.

Accordingly, I was considering conducting transcripts per million (TPM) normalisation of the counts to account for differences in sequencing depths across samples while taking into account virus genome sizes. I was then thinking of feeding these TPM-normalised counts into ConQuR to remove the impacts of any confounding batch effects. Finally, I would use the ConQuR-corrected counts in our downstream conditional logistic regression where we are testing for associations between viruses and our outcome of interest.

Does this make sense, and does the approach seem sound to you? Or does it make more sense to skip the TPM normalisation and just feed the raw counts into ConQuR? If so, should virus genome size and sequencing depth somehow be included in the arguments provided to ConQuR?

Thanks!

Charles

wdl2459 / ConQuR

Thoughts on combining ConQuR with TPM-normalisation #21