lh3 / hickit

TAD calling, phase imputation, 3D modeling and more for diploid single-cell Hi-C (Dip-C) and general Hi-C
106 stars 11 forks source link

Forgot to mention the sex of the samples #14

Open tarak77 opened 5 years ago

tarak77 commented 5 years ago

Hi @lh3 @tanlongzhi ,

When working my single cell data, I preprocess the input FASTQ files using hickit repo and then model using dip-c repo so as to do 3D imputation. While doing so, I forgot to mention the sex of the samples while getting the .pairs file from phased SNPs data. When I am visualize these models, I do have a chrX(mat) and chrY(pat) for male cells but there is only a single chrX(mat) for females. I went back to the .pairs file and saw the the chrX always has a value of 1 in phase0 and phase1 columns.

I am confused, shouldn't the female cells also have chrX(mat) and chrx(pat)? or is it because of not using chronly -y -, I didn't get two X chromosomes in female cells?

Any help will be great!

tanlongzhi commented 5 years ago

You have to use chronly -y - to set the sex to female, because this repo determines sex only by the presence or absence of the Y chromosome.

Btw for males, we haven't updated the README yet, but you will also need to remove the PARs.

tarak77 commented 5 years ago

Ah I see. Thanks! Also how to remove the pseudoautosomal regions?

tanlongzhi commented 5 years ago

You can remove PARs by something like this:

hickit.js sam2seg -v snp.txt.gz aln.sam.gz 2> contacts.seg.log | hickit.js chronly - | hickit.js bedflt mm10.par.bed - | gzip > contacts.seg.gz
tarak77 commented 5 years ago

Okay. Just to be clear, PARs are needed to be removed only from human cells or mouse too?

tanlongzhi commented 5 years ago

PARs should be removed for any organism that has mappable PARs.

Anyways PARs shouldn't affect the results too much because they're rather short.

tarak77 commented 5 years ago

Got it. Thanks!