vaquerizaslab / fanc

FAN-C: Framework for the ANalysis of C-like data
GNU General Public License v3.0
107 stars 14 forks source link

Missing chromosomes after downsampling #91

Open KSchtz opened 2 years ago

KSchtz commented 2 years ago

Hi!

I'm glad I've found fanc to downsample my data but unfortunately I have some issues. I've used fanc (v0.9.21) like this for four samples:

fanc hic -t 48 --deepcopy --downsample $cisreads $input_hic_file@10000 $out_hic_file

One of the samples worked just fine, but the others are missing chromosomes. One misses chr12, another misses chr18,chr20,chrX and the third misses chr3, chr12 and chr20. The input files contain all chromosomes at 10 kb resolution. I tried using less threads and more RAM (currently using 180GB) but the results remain the same. The log does not show any errors (see attachment), the only indication of the problem appears when I'm converting to mcool (missing chr12):

fanc to-cooler -t 48 $out_hic_file $mcool

[...] 2021-12-17 17:06:50,842 INFO 1 - 11 2021-12-17 17:06:54,266 INFO Writing chunk 9: /tmp/tmpr9el3tid.multi.cool::9 2021-12-17 17:06:54,384 INFO Creating cooler at "/tmp/tmpr9el3tid.multi.cool::/9" 2021-12-17 17:06:54,385 INFO Writing chroms 2021-12-17 17:06:54,386 INFO Writing bins 2021-12-17 17:06:54,483 INFO Writing pixels 2021-12-17 17:06:54,560 INFO Writing indexes 2021-12-17 17:06:54,618 INFO Writing info 2021-12-17 17:06:54,623 INFO 1 - 12 2021-12-17 17:06:55,888 INFO 1 - 13 2021-12-17 17:06:58,559 INFO Writing chunk 10: /tmp/tmpr9el3tid.multi.cool::10 2021-12-17 17:06:58,677 INFO Creating cooler at "/tmp/tmpr9el3tid.multi.cool::/10" 2021-12-17 17:06:58,678 INFO Writing chroms 2021-12-17 17:06:58,680 INFO Writing bins 2021-12-17 17:06:58,779 INFO Writing pixels 2021-12-17 17:06:58,836 INFO Writing indexes 2021-12-17 17:06:58,892 INFO Writing info [...]

Is there anything I can try?

env.txt out.log

kaukrise commented 2 years ago

Hi, I will still be on holiday for another week, but maybe we can start getting this figured out.

Also, can you please run the following in Python console and post the output here?

import fanc
import numpy as np

hic = fanc.load("$input_hic_file@10000")  # replace with actual file name
print(hic)

ix_to_chromosome = {r.ix: r.chromosome for r in hic.regions}

n = 0
chromosomes = set()
for edge in hic.edges(lazy=True):
    chromosomes.add(ix_to_chromosome[edge.source])
    chromosomes.add(ix_to_chromosome[edge.sink])
    n += 1
print(sorted(chromosomes))
print(np.unique(list(ix_to_chromosome.values())))
print(n, len(hic.edges))
kaukrise commented 2 years ago

Oh, and please run the code also on the downsampled file! Thanks!

KSchtz commented 2 years ago

Thank you for your fast response even though you're on holiday.

ix_to_chromosome = {r.ix: r.chromosome for r in hic.regions} throws a warning

UserWarning: Cannot find normalisation vector for chromosome: chr12, normalisation: KR, resolution: 10000, unit: BP. This could indicate that KR normalisation did not work for this chromosome. Will return NaN instead. warnings.warn("Cannot find normalisation vector for " UserWarning: Cannot find normalisation vector for chromosome: chrM, normalisation: KR, resolution: 10000, unit: BP. This could indicate that KR normalisation did not work for this chromosome. Will return NaN instead. warnings.warn("Cannot find normalisation vector for "

Appears that chrM is also missing, I didn't notice that.

>>> print(sorted(chromosomes)) ['chr1', 'chr10', 'chr11', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr2', 'chr20', 'chr21', 'chr22', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chrX', 'chrY']

>>> print(np.unique(list(ix_to_chromosome.values()))) 'chr1' 'chr10' 'chr11' 'chr12' 'chr13' 'chr14' 'chr15' 'chr16' 'chr17' 'chr18' 'chr19' 'chr2' 'chr20' 'chr21' 'chr22' 'chr3' 'chr4' 'chr5' 'chr6' 'chr7' 'chr8' 'chr9' 'chrM' 'chrX' 'chrY']

print(n, len(hic.edges)) 112914790 112914790

I'll give an update once the script has finished.

kaukrise commented 2 years ago

Hey, the normalisation vector warning is definitely the issue. I don't know why FAN-C is unable to find them - we can try to figure that out once I am back.

But since you need to renormalise after downsampling anyways I'd recommend:

KSchtz commented 2 years ago

Thank you, I'll try that and give an update.

KSchtz commented 2 years ago

Hi! Downsampling worked without any issues. Upgrading to v0.9.22 and adding @NONE to the input Juicer file did the trick. Thank you!