quadbio / Pando

Multiome GRN inference.
https://quadbio.github.io/Pando/
MIT License
106 stars 21 forks source link

Upstream/Downstream respecting gene strand? #4

Closed dpcook closed 2 years ago

dpcook commented 2 years ago

Hi Jonas,

Thanks again for all your work on this. I could be wrong about this--if so, just let me know.

I was dealing with some issues with the notion of "upstream" and "downstream" in Signac reflecting chromosome coordinate rather than relative position to the gene. I looked a bit into Pando's code handling the upstream and downstream parameters for infer_grn() and if I'm looking at the appropriate code, it uses Signac::Extend() on the TSS range. If the target gene is on the negative strand, this would extend into the gene body rather than into putative regulatory regions.

I could be missing something, but figured it would be good to at least bring it up. Thanks!

joschif commented 2 years ago

Hi @dpcook, I think Signac::Extend() should take the strand of the gene into account. So if the gene is on the negative strand, 'upstream' should still be with respect to the TSS and away from the gene body. However, there are often cases where the strandedness is not annotated, is which case Extend() will default to the + strand. Therefore I would acually recommend just extending it into both directions to keep it consistent (or to make sure all gene ranges are fully annotated).

dpcook commented 2 years ago

Oops, you're right. I thought I had tested it and concluded that it ignored strand info, but I clearly didn't look close enough.

Confirmation: image

Sorry about that!