Open KSchtz opened 2 years ago
Hi, I will still be on holiday for another week, but maybe we can start getting this figured out.
0.9.22
) to Pypi, can you try with that first, please?--deepcopy
, please? The resulting file is a copy anyways, and I'd like to exclude possible issues. Also, you can remove the -t 48
, downsampling is currently single-threaded anyways.Also, can you please run the following in Python console and post the output here?
import fanc
import numpy as np
hic = fanc.load("$input_hic_file@10000") # replace with actual file name
print(hic)
ix_to_chromosome = {r.ix: r.chromosome for r in hic.regions}
n = 0
chromosomes = set()
for edge in hic.edges(lazy=True):
chromosomes.add(ix_to_chromosome[edge.source])
chromosomes.add(ix_to_chromosome[edge.sink])
n += 1
print(sorted(chromosomes))
print(np.unique(list(ix_to_chromosome.values())))
print(n, len(hic.edges))
Oh, and please run the code also on the downsampled file! Thanks!
Thank you for your fast response even though you're on holiday.
ix_to_chromosome = {r.ix: r.chromosome for r in hic.regions}
throws a warning
UserWarning: Cannot find normalisation vector for chromosome: chr12, normalisation: KR, resolution: 10000, unit: BP. This could indicate that KR normalisation did not work for this chromosome. Will return NaN instead. warnings.warn("Cannot find normalisation vector for "
UserWarning: Cannot find normalisation vector for chromosome: chrM, normalisation: KR, resolution: 10000, unit: BP. This could indicate that KR normalisation did not work for this chromosome. Will return NaN instead. warnings.warn("Cannot find normalisation vector for "
Appears that chrM is also missing, I didn't notice that.
>>> print(sorted(chromosomes))
['chr1', 'chr10', 'chr11', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr2', 'chr20', 'chr21', 'chr22', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chrX', 'chrY']
>>> print(np.unique(list(ix_to_chromosome.values())))
'chr1' 'chr10' 'chr11' 'chr12' 'chr13' 'chr14' 'chr15' 'chr16' 'chr17' 'chr18' 'chr19' 'chr2' 'chr20' 'chr21' 'chr22' 'chr3' 'chr4' 'chr5' 'chr6' 'chr7' 'chr8' 'chr9' 'chrM' 'chrX' 'chrY']
print(n, len(hic.edges))
112914790 112914790
I'll give an update once the script has finished.
Hey, the normalisation vector warning is definitely the issue. I don't know why FAN-C is unable to find them - we can try to figure that out once I am back.
But since you need to renormalise after downsampling anyways I'd recommend:
fanc hic -n
Thank you, I'll try that and give an update.
Hi! Downsampling worked without any issues. Upgrading to v0.9.22 and adding @NONE to the input Juicer file did the trick. Thank you!
Hi!
I'm glad I've found fanc to downsample my data but unfortunately I have some issues. I've used fanc (v0.9.21) like this for four samples:
fanc hic -t 48 --deepcopy --downsample $cisreads $input_hic_file@10000 $out_hic_file
One of the samples worked just fine, but the others are missing chromosomes. One misses chr12, another misses chr18,chr20,chrX and the third misses chr3, chr12 and chr20. The input files contain all chromosomes at 10 kb resolution. I tried using less threads and more RAM (currently using 180GB) but the results remain the same. The log does not show any errors (see attachment), the only indication of the problem appears when I'm converting to mcool (missing chr12):
fanc to-cooler -t 48 $out_hic_file $mcool
[...] 2021-12-17 17:06:50,842 INFO 1 - 11 2021-12-17 17:06:54,266 INFO Writing chunk 9: /tmp/tmpr9el3tid.multi.cool::9 2021-12-17 17:06:54,384 INFO Creating cooler at "/tmp/tmpr9el3tid.multi.cool::/9" 2021-12-17 17:06:54,385 INFO Writing chroms 2021-12-17 17:06:54,386 INFO Writing bins 2021-12-17 17:06:54,483 INFO Writing pixels 2021-12-17 17:06:54,560 INFO Writing indexes 2021-12-17 17:06:54,618 INFO Writing info 2021-12-17 17:06:54,623 INFO 1 - 12 2021-12-17 17:06:55,888 INFO 1 - 13 2021-12-17 17:06:58,559 INFO Writing chunk 10: /tmp/tmpr9el3tid.multi.cool::10 2021-12-17 17:06:58,677 INFO Creating cooler at "/tmp/tmpr9el3tid.multi.cool::/10" 2021-12-17 17:06:58,678 INFO Writing chroms 2021-12-17 17:06:58,680 INFO Writing bins 2021-12-17 17:06:58,779 INFO Writing pixels 2021-12-17 17:06:58,836 INFO Writing indexes 2021-12-17 17:06:58,892 INFO Writing info [...]
Is there anything I can try?
env.txt out.log