Closed liz-is closed 3 years ago
Hey @liz-is ,
thank you for the detailed bug report. Can you please try to plot the O/E matrix of a chromosome (or part thereof) that fails? I have a suspicion that the expected values might be the issue here, in which case this is probably related to the FAN-C dev version.
Thanks!
Thanks for looking into this Kai! Here's the O/E matrix for the same dataset and chromosome.
Hey Liz!
There is a lot of white in this matrix, which according to the colorbar is oe=1. Are all these values actually 1 or very very close to 1?
1 is the default masking value for unmappable pixels in chess. All 1 matrix rows are marked as unmappable rows if the row sum equals the row length (looking at the code now this already doesn't seem ideal to me). This is not done for the whole chromosome matrix, but only on the submatrices that are compared; so a row doesn't have to be all 1 for the whole chromosome, only in a particular compared region in order to be marked as unmappable.
You could try to increase the fraction of unmappable bins that chess permit with --mappability-cutoff
(maybe 0.5 or even higher?). This is not a fix, but might point out if this bug has something to do with false masking or computation of oe values.
Hi @nickmachnik ,
this was an issue with the FAN-C development version, which we could figure out independently, so I am closing this!
Hi folks,
Some of my region pairs are being deemed invalid, but I don't think they fall into any of the possible reasons given. Do you have any other ideas what the issue might be? Is there a way I can get more diagnostic info to try to debug this myself (without having to dig deep into the code and run each step manually, which I can do if necessary)?
Here's the error message:
This is Drosophila Hi-C data. I've tried different resolutions and two different window sizes (100x and 150x the bin size). The pairs file for each parameter combo was generated with
chess pairs
from the same text file with the chromosome sizes (and these files look okay to me from a quick glance).In each example, all bins from certain chromosomes are missing! In particular, chr 2R and 3R. However I get results for these chrs at 25kb resolution so I don't think there is a chromosome naming mismatch between the files or anything like that.
(N.B., it makes sense that there are no valid pairs on chr 4 at 25kb resolution, since I'm using a window size of at least 2.5 Mb, which is larger than the chromosome size. Same for 10 kb resolution with 150x window size)
I would have thought that it would be a resolution issue (i.e. too many unmappable bins), but having plotted each chromosome at 10kb resolution in both my query and my reference, they look fine. Some unmappable bins but I'd expect to get some results - they don't look any worse than other chromosomes.
I'm happy to look into this further myself since I have some familiarity with the code by now, but I'm not really sure where to start. Do you have any ideas?
I am using a development version of FAN-C, but @kaukrise said that it should work fine.
Also, as a more general comment, would it be possible to implement a more informative version of this message?
2021-01-15 14:45:01,759 INFO Could not compute similarity for 6316 region pairs.This can be due to faulty coordinates, too smallregion sizes or too many unmappable bins
I've seen other questions relating to this, so it seems like a common issue/point of confusion. Although most of the time this is easy to solve, it would be helpful to know which of those three possibilities accounts for the invalid pairs as a starting point for debugging.