vaquerizaslab / fanc

FAN-C: Framework for the ANalysis of C-like data
GNU General Public License v3.0
104 stars 14 forks source link

--Region option in fanc insulation is not working #66

Open nadabrovitxka opened 3 years ago

nadabrovitxka commented 3 years ago

Love this toolkit so much. Makes everything so convenient. Great job!

I am using fanc v 0.9.20 and it works well for full matrices but having a slight trouble calculating insulation scores when the matrix is a small subset of the genome (Capture HiC). This command: fanc insulation stem_wt_ontarget.hic@5kb stem_wt_ontarget.insulation -w 500000 -r chr3:34mb-36mb -o bigwig on this .hic file is not really restricting the analysis to chr3. It creates this message for each chromosome (except chr3 of course).

2021-07-30 09:15:24,926 INFO FAN-C version: 0.9.20 2021-07-30 09:15:36,093 INFO Chosen window sizes: 500000 /data/pedrorocha/conda/envs/fanc-env-2/lib/python3.7/site-packages/fanc/compatibility/juicer.py:718: UserWarning: Cannot find normalisation vector for chromosome: 1, normalisation: KR, resolution: 5000, unit: BP. This could indicate that KR normalisation did not work for this chromosome. Will return NaN instead. res=resolution, unit=unit

But even though these errors do not stop the script the insulation score calculation eventually fails with the following message

Traceback (most recent call last): File "/data/pedrorocha/conda/envs/fanc-env-2/bin/fanc", line 127, in Fanc() File "/data/pedrorocha/conda/envs/fanc-env-2/bin/fanc", line 93, in init command([sys.argv[0]] + sys.argv[option_ix:], log_level=log_level, verbosity=verbosity) File "/data/pedrorocha/conda/envs/fanc-env-2/lib/python3.7/site-packages/fanc/commands/fanc_commands.py", line 3676, in insulation normalisation_window=normalisation_window) File "/data/pedrorocha/conda/envs/fanc-env-2/lib/python3.7/site-packages/fanc/commands/fanc_commands.py", line 3738, in _domain_scores scores.to_bigwig(output_file, window_size, subset=sub_region) File "/data/pedrorocha/conda/envs/fanc-env-2/lib/python3.7/site-packages/fanc/architecture/domains.py", line 163, in to_bigwig return self._to_file(file_name, parameter, subset=subset, _write_function=write_bigwig) File "/data/pedrorocha/conda/envs/fanc-env-2/lib/python3.7/site-packages/fanc/architecture/domains.py", line 129, in _to_file _write_function(file_name, regions) File "/data/pedrorocha/conda/envs/fanc-env-2/lib/python3.7/site-packages/fanc/tools/files.py", line 544, in write_bigwig bw.addHeader(header) RuntimeError: You input an empty list! [bwClose] There was an error while finishing writing a bigWig file! The output is likely truncated.

which maybe comes from the fact that it is reading the whole chromosome 3, which is mostly empty. running it with chr3:34000000-35500000 gives me the same result. Any clue?

kaukrise commented 3 years ago

Hi, thank you for the nice words, they mean a lot!

Since I am currently on holiday, I don't have access to a desktop computer. Therefore "proper" debugging will have to wait until I am back in about two weeks.

However, maybe we can find a workaround in the meantime. The way fanc insulation is set up is that it will always calculate the insulation scores for the entire genome and then subset to the specified region at a later stage. For your use case this is obviously less than ideal, and I will have to limit that to the chromosome in question.

For now, my advice would be to run the command on the whole genome with

fanc insulation stem_wt_ontarget.hic@5kb stem_wt_ontarget.insulation -w 500000

to generate the .insulation file. I would also add a couple more window sizes, down to 50kb or so - they will all be stored in the same object and the added calculation time is not so bad compared to calculating them separately later.

Then extract the score subset with

fanc insulation stem_wt_ontarget.insulation sub.insulation -r chr3:34mb-36mb

Then do the conversion to bigwig with

fanc insulation sub.insulation sub_500kb.bw -o bw -w 500000

I hope this works - as I mentioned, I can't test the commands right now.

If the subsetting is still causing issues, maybe you can instead convert the whole genome file to BigWig and subset that with a different tool?

fanc insulation stem_wt_ontarget.insulation stem_wt_ontarget_500kb.bw -o bw -w 500000
nadabrovitxka commented 3 years ago

So, the first option I had tried myself with an earlier fanc version and had failed right when it couldnt find the normalization vector for the other chromosomes. The current version seems to go through that fine and produced the whole genome insulation file. It then fails at fanc insulation stem_wt_ontarget.insulation sub.insulation -r chr3:34mb-36mb

fanc insulation: error: Output file cannot be empty when choosing default output format!

BUT, creating a bigwig of the whole genome works well and I got my insulation scores and they do make perfect sense. Thanks a lot, enjoy your vacation.

kaukrise commented 3 years ago

Thanks for reporting back so quickly. I'm glad you got something usable in the end - I'll look into fixing the issue properly once I am back, so I'll keep this open until then!