Closed mthimma closed 8 years ago
To figure out where exactly the issue with your own data is, I would need to see some sample input files - can you create smaller sub-matrices that show the same problem and share them in this thread?
You mention using a "sparse matrix" format - I am wondering if that is what is causing the issue. So far, TADtool only supports square matrices (as a plain text file) or numpy matrices - please see the README for details.
Also, the error message is nan-related. Do you have any nan-values in you matrix? nan values are not supported, but if you have "invalid" entries in the matrix, you can use a masked numpy array to make TADtool ignore them in the calculations.
Sorry to confuse with sparse matrix. I have used square matrix only.
Here are few lines of regiions and matrix file.
head Ago1_chr21_19000000_20000000_regions.bed chr21 19000001 19040000 chr21 19040001 19080000 chr21 19080001 19120000 chr21 19120001 19160000 chr21 19160001 19200000 chr21 19200001 19240000 chr21 19240001 19280000 chr21 19280001 19320000 chr21 19320001 19360000 chr21 19360001 19400000
Matrix file content. head -1 Ago1_chr21_19000000_20000000_matrix.txt 227.479752 39.232669 11.168846 12.327291 10.474954 8.099735 0.0 15.166322 6.104866 8.249842 6.503839 4.832022 5.855322 2.166505 5.075873 3.009444 1.529645 1.262416 2.195865 2.039474 1.189086 0.0 1.574083 0.0 0.0 0.0 0.619495 1.575717 0.964216 0.0 0.0 0.0 0.0 0.0 2.041819 0.776171 0.0 0.0 0.0 0.0 0.0 1.240425 0.0 0.659083 0.498728 0.0 0.0 1.024804 0.0 0.573833 0.0 0.521218 0.615119 0.0 0.914550 0.0 0.759432 0.731435 2.395341 0.868357 1.351828 0.0 0.0 2.357089 0.700911 1.233547 0.883523 0.0 2.700138 0.0 1.495020 1.363025 1.001810 0.0 2.492956 0.0 1.855037 2.743252 0.0 0.730774 1.046627 0.0 0.0 0.0 2.002836 0.0 0.0 1.066103 0.0 0.0 0.797081 0.0 0.693685 0.553466 0.840476 0.0 0.653772 0.0 0.00.0 0.0 0.0 0.0 0.0 0.0 0.0 0.799307 0.0 0.0 0.0 0.723579 0.0 0.0 0.0 0.931139 0.0 0.0 0.0 0.0 0.0 0.0 0.934063 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.959208 0.0 0.0 0.747932 0.829960 0.904816 0.0 0.921456 0.0 0.0 0.0 0.0 0.0 0.812723 0.749612 0.0 0.0 0.401332 1.457126 0.0 0.0 0.0 0.0 1.266839 0.0 0.811671 0.0 0.778087 0.0 0.0 0.0 0.0 0.0 0.0 0.699820 0.0 0.0 0.0 0.0 0.938864 0.0 0.0 0.548134 0.0 0.0 0.0 0.780729 1.170916 0.0 0.0 0.0 0.0 1.503517 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.565728 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.558759 0.510347 0.0 0.0 0.0 0.0 1.075141 0.0 0.0 0.0 0.0 0.510138 0.0 0.0 2.316669 0.0 0.0 0.824153 0.685088 0.0 0.0 0.958883 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.869335 0.0 0.0 0.0 0.0 1.011194 0.0 0.0 0.0 0.914599 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.146620 0.0 0.0 0.0 0.0 0.0 0.00.814921 0.0 1.101599 0.0 0.776905 0.0 0.0 0.701413 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.593688 0.0 0.422295 0.523363 0.0 0.0 0.0 0.0 0.0 0.719439 0.0 0.0 0.0 0.0 0.807190 0.727033 0.0 0.668808 0.0 0.0 0.0 0.937627 0.895826 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.006748 0.950754 0.0 0.0 0.0 0.0 0.689344 0.0 0.0 0.0 0.0 0.0 0.0 0.888382 0.00.0 0.0 0.0 0.0 0.658171 1.177735 0.0 0.0 0.0 0.0 0.0 0.0 1.077108 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00.0 0.0 0.770569 0.0 0.0 0.931165 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00.0 0.944484 0.0 0.0 0.0 1.040034 0.0 0.0 0.0 0.0 1.547128 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.630072 0.851198 0.0 0.0 0.0 0.801254 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.863164 0.0 0.0 0.0 0.0 0.722197 0.0 0.0 1.538760 0.00.0 0.0 0.0 0.0 1.332552 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.547882 0.0 0.0 0.00.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.913286 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.015649 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.924472 0.0 1.204229 0.0 0.0 0.0 1.547932 0.0 0.0 0.0 0.0 0.0 0.742428 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.256927 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.763437 0.00.0 0.842428 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.606727 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.765786 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.055287 0.0 0.0 0.766397 0.0 0.0 0.0 0.0 0.955582 1.436155 0.0 1.429250 1.791767 0.0 0.0 0.791706 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.950301 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
I have all the entries in the matrix as numbers and have no clue where nan is popping up?
Thank you for this. I think I can solve part of the issue, but not the whole thing just yet.
So, one problem is the size of the matrix you are supplying. The insulation index is calculating contact averages in a square window along the diagonal - if there is no data in that region (as it is on the edges of your matrix) it can't calculate the insulation. This is why you are seeing the white "arc" in the "flame plot" area of TADtool - there is just not enough data in that region of the matrix to calculate an insulation value.
Now, why the rest of the flame plot is all in one color, I don't know. You should be seeing the "flames" there. Number-wise the small sample you sent looks ok - would you have any way of giving me access to the whole matrix and regions files, so I can try to debug on my machine?
Also, one thing you can try first is to use the tadtool command directly from the command line and not the script you are using. This would exclude any issues with the script you are using. So please, first try
tadtool plot Ago1_chr21_19000000_20000000_matrix.txt Ago1_chr21_19000000_20000000_regions.bed chr12:19000000-20000000
and check if you get the same issue.
edit: corrected tadtool
command
Thanks Vaqueriz. I am more than happy to send my data for debugging! How do I send them? via emall? matrix file is 2.3Mb size and regions is 18kb in size. I would update you after trying the cmd line.
Those files are pretty small. If you don't mind, it would be easiest if you just attach them to a comment in this thread.
Here are the files. Ago1_chr21_19000000_20000000_matrix.txt
Ago1_chr21_19000000_20000000_regions.txt
PS: since bed file format is not allowed to be attached, I have renamed it with .txt extension.
Alright, one thing I can see immediately is that your matrix is not symmetrical, as a Hi-C matrix should be:
import tadtool
import numpy as np
data = tad.HicMatrixFileReader().matrix("Ago1_chr21_19000000_20000000_matrix.txt")
rowsums = np.sum(data, 0)
colsums = np.sum(data, 1)
np.isclose(rowsums, colsums).all() # returns False
This is definitely troubling and indicates some larger problem with your matrix creation. I am not certain that this is what is causing the insulation issues above, but this must be fixed nonetheless!
Yep. That was the issue. Only the upper half of your matrix had non-zero values, but the insulation index in TADtool is calculated on the lower half (which is irrelevant, since you should be supplying a symmetrical matrix). This works:
import tadtool.tad as tad
data = tad.HicMatrixFileReader().matrix("Ago1_chr21_19000000_20000000_matrix.txt")
regions = tad.HicRegionFileReader().regions("Ago1_chr21_19000000_20000000_regions.txt")
# copy values from top to bottom half
for i in xrange(data.shape[0]):
for j in xrange(i+1, data.shape[0]):
data[j, i] = data[i, j]
ii = tad.insulation_index(data, regions=regions, window_size=500000) # now has non-zero values in it
For your convenience, the fixed file, but please double- and triple-check before using it in your analyses. Ago1_chr21_19000000_20000000_matrix_fixed.txt
Hi I tried with my matrix file once again. I could see the test your proposed turns False.
import tadtool.tad as tad, numpy as np data=tad.HicMatrixFileReader().matrix("Ago1_chr21_20000000_20480000_matrix.txt") rowsums=np.sum(data,0) colsums=np.sum(data,1) np.isclose(rowsums,colsums).all() False I would like to check the way I generate the regions and matrix file with you. Assuming we have 10 regions, in the regions file, I am trying to create a 10x10 matrix where each entry shows the number of normalized interactions in those regions (for example row1_col1, row1_col2 etc). I am creating this matrix for each regions mentioned. If so then I could see a square matrix created. My apologies if I havent understood it properly. What do you mean by the matrix not symmetric? How do I create a symmetric matrix given I have region data?
Assuming you have contact data for a 3x3 Hi-C matrix like this:
row1 col1 1
row1 col2 2
row1 col3 3
row2 col2 4
row2 col3 5
row3 col3 6
You have currently created your matrix like this:
1 2 3
0 4 5
0 0 6
Now, symmetrical means that the matrix is (a) square and (b) rowX_colY == rowY_col_X. So, to make your matrix symmetrical, it would have to look like this:
1 2 3
2 4 5
3 5 6
The code I posted above does precisely that. I strongly recommend reading some high-impact Hi-C publications from the Lieberman lab - if you want to work with Hi-C data, it is fundamental to understand these concepts!
I tried creating the symmetrix matrix and tested for symmetry as you suggested.
import tadtool.tad as tad, numpy as np data=tad.HicMatrixFileReader().matrix("Ago1_chr21_20000000_20480000_matrix.txt") rowsums=np.sum(data,0) colsums=np.sum(data,1) np.isclose(rowsums,colsums).all() True When I do plot the TADs, using this data, I could see the following figure. Is this right?
Looks okay. If you are wondering why the output looks more pixelated than in the example data, that is because your matrix has a resolution of 40kb, whereas the example has 10kb resolution. So, you can either plot a larger region or increase the resolution of your matrix - the latter, of course, depends on your sequencing coverage.
Thanks very much for all your help! I will get back to you in case there is a problem.
Dear Vaquerisa,
Thanks for helping to run the example data, which worked well.
Now I am running TADtool on my own data and it doesnt seem to produce similar graph to the example one. Chr21_inhouse.pdf
Output message while running the data was as given below. python chr12_plot.py 0% ( 0 of 100) | | Elapsed Time: 0:00:00 ETA: --:--:--/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/lib/nanfunctions.py:675: RuntimeWarning: Mean of empty slice warnings.warn("Mean of empty slice", RuntimeWarning) 100% (100 of 100) |####################################################################################################################################################| Elapsed Time: 0:00:06 Time: 0:00:06 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/lib/nanfunctions.py:227: RuntimeWarning: All-NaN axis encountered warnings.warn("All-NaN axis encountered", RuntimeWarning) /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/matplotlib/axes/_base.py:3040: UserWarning: Attempting to set identical bottom==top results in singular transformations; automatically expanding. bottom=0.0, top=0.0 'bottom=%s, top=%s') % (bottom, top))
I used the same script as the example but with our regions and sparse matrix file.