Closed cmdoret closed 3 years ago
The goal here is to address concerns raised in #10
Hi Cyril,
Thanks for looking into this and for the pull request! These sort of helper functions would definitely be useful and we're happy to have others contribute. A couple of questions/suggestions:
Hi @liz-is
Thanks for your answer, indeed, I missed the zooming part, and that makes the helper function wrong.
I updated my helper function to have a zoom factor added after rotating coordinates.
The rotate_feature
function rotates a single (x,y) coordinates relative to a window center. It works as follows:
The other function, get_feature_coords
simply relies on the first one to convert an entire feature (4 coords) and performs the conversion to basepairs.
I used a pair of symmetric features reported by CHESS in real world data to test rotate_feature
.
# Coordinates as reported in lost_features.tsv
xmin1, xmax1, ymin1, ymax1 = 187, 192, 88, 95
xmin2, xmax2, ymin2, ymax2 = 187,192,106,113
# Format coords to a matplotlib friendly format
coords_chess = [
[xmin1, xmin1, xmax1, xmax1, xmin2, xmin2, xmax2, xmax2],
[ymin1, ymax1, ymin1, ymax1, ymin2, ymax2, ymin2, ymax2],
]
# Rotate coords using my helper function
coords_rotated = list(map(
lambda c: rotate_feature(c[0], c[1], demo_win.shape[0]),
zip(*coords_chess)
))
coords_rotated = [
[c[0] for c in coords_rotated],
[c[1] for c in coords_rotated],
]
# Region of the matrix from CHESS window
clr = cooler.Cooler('mysample.cool')
demo_win = clr.matrix(sparse=False, balance=True).fetch("chr14:52000001-62000001")
demo_win = np.nan_to_num(demo_win)
# Rotate + zoom matrix using CHESS method
zm1 = clipped_zoom(demo_win, 0.7)
rot1 = ndi.rotate(zm1, 45, reshape=False)
# Visualize original and rotated coordinates with their matrix
fig, ax = plt.subplots(1, 2)
ax[0].imshow(demo_win, vmin=demo_feat.min(), vmax=demo_feat.max())
ax[0].scatter(coords_rotated[0], coords_rotated[1], c='r')
ax[1].imshow(rot1, vmin=demo_feat.min(), vmax=demo_feat.max())
ax[1].scatter(coords_chess[0], coords_chess[1], c='r')
ax[0].set_title('Original window\nManually rotated coords')
ax[1].set_title('CHESS-rotated window\nCHESS coords')
If I zoom to see the coordinates better:
Now if I plot the values of the first features (additional columns of lost_features):
# demo_feat = np.array([many values])
# demo_feat = demo_feat.reshape(xmax1 - xmin1, ymax1 - ymin1)
plt.imshow(demo_feat)
The feature image looks similar to the area circled by red points, not exactly the same but I imagine it may have undergone some processing steps. The details of the image seem to be distorted a bit by the rotation also.
These functions would be handy for downstream analysis of regions from CHESS features, but ultimately I think you're right it would definitely be much better if there was no rotation / zoom at all. Thanks for working on that ! :)
Hi Cyril, sorry for the slow reply and many thanks for the detailed example! Now I understand much better. Part of my confusion was that I was expecting it to return two sets of coordinates for each feature (i.e. the coordinates of the edges of the bounding box), rather than four (the coordinates of each corner of the bounding box). But either is fine, as long as it's clear what it does.
This looks good to me, and I think with this thread as an example/explanation others would be able to use it. However, the original devs should agree before I can merge this in -- @kaukrise @nickmachnik @sgalan what do you think?
This looks good to me, thanks for submitting this @cmdoret ! I think it would be good if @sgalan took a look and approved this, Ill contact her.
Hello,
I struggled with converting the feature coordinates from CHESS coordinates to genomic coordinates, and it seems other people had the same problem.
In this pull request, I add two helper functions to help users perform this conversion:
rotate_feature
: Perform the -45 degree rotation of a single (x, y) coordinate to go from the "horizontal diagonal" representation to the square matrix representation.get_feature_coords
: Convert the CHESS feature coordinates (xmin, xmax, ymin, ymax) into basepair coordinates.Currently, I put these functions in
chess.get_structures
, but I'm not sure is this is appropriate. I tried to document the code as much as possible.