nanoporetech / Pore-C-Snakemake

Other
33 stars 15 forks source link

Minimum number of contacts for pore-C analysis #17

Open antarikshtyagi opened 3 years ago

antarikshtyagi commented 3 years ago

Hello Eoghan and team,

After multiple rounds of nanopore sequencing we have reached to 50-60 Gbases data for each sample. Pore-C run results in ~250 million contacts each. However downstream processing with juicer results in warning "Warning: Hi-C map is too sparse to find many domains via Arrowhead". slurm-28491.txt

Also, similar warnings for HiCCUPS Could you please suggest the minimum number of contacts required for a reliable pore-C analysis and how much data it would correspond to.

Thanks Ant

eharr commented 3 years ago

Hi Ant,

Unfortunately there's no simple answer to this as it will depend on a lot of factors (eg. genome size, TAD-size, how frequent the structure is in the cells you're analysing) . Assuming it's a human genome then the number of contacts you describe should be enough to see larger TADS. If the tool is complaining that the data is too sparse then you might consider increasing the bin size (ie lower the resolution). Have you browsed the contact map to make sure you can see them by eye?

Eoghan

antarikshtyagi commented 3 years ago

Eoghan,

Yes, it is human genome and I can see the contact map in Juicebox (attached). I have tried to run at all the available resolutions up to 2500000 and none of the resolutions result in an output. Here are my align stats: final_stats: read_length: 52422997084.0 num_contacts: 270185229.0 num_cis_contacts: 144818127.0 num_aligns: 120292878.0 num_pass_aligns: 89829667.0 reads: 24672987.0 Gb: 52.422997084 contacts_per_Gb: 5153944.719472422 percent_cis: 53.5995722401242

Attached are images of Juicebox visualization of all chromosomes as well as Chromosome 1 random region at 5k resolution.

image_all_chr image_chr1_5k

Thanks Ant

eharr commented 3 years ago

Hi Ant,

Thanks for sharing the contact maps. To me it seems like you have pretty decent structure in there - it may be a little bit sparse at the 5kb resolution but I think you should be able to detect domains at higher resolutions. I don't know the TAD-detection tools well enough to know why these specific tools are not working on these contact maps but it might be worth looking at this paper: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1596-9. They compare a bunch of TAD callers under a range of coverage and resolution settings - there might be some information in there that could help.

Sorry I can't be of more help.

Eoghan