Add coordinate conversion helpers

cmdoret commented 3 years ago

Hello,

I struggled with converting the feature coordinates from CHESS coordinates to genomic coordinates, and it seems other people had the same problem.

In this pull request, I add two helper functions to help users perform this conversion:

rotate_feature : Perform the -45 degree rotation of a single (x, y) coordinate to go from the "horizontal diagonal" representation to the square matrix representation.
get_feature_coords: Convert the CHESS feature coordinates (xmin, xmax, ymin, ymax) into basepair coordinates.

Currently, I put these functions in chess.get_structures, but I'm not sure is this is appropriate. I tried to document the code as much as possible.

cmdoret commented 3 years ago

The goal here is to address concerns raised in #10

liz-is commented 3 years ago

Hi Cyril,

Thanks for looking into this and for the pull request! These sort of helper functions would definitely be useful and we're happy to have others contribute. A couple of questions/suggestions:

It's not totally clear to me yet how the rotation of coordinates works - I probably need to try it out myself to properly understand it. A diagram or worked example might be helpful if you have one on hand.
However, I don't think these functions currently account for the fact that the matrix is not only rotated, but also shrunk to ~70% of its original size, as part of the feature extraction pipeline (see here: https://github.com/vaquerizaslab/chess/blob/master/chess/get_structures.py#L158) in order to maintain the whole matrix fitting within the same matrix area after rotation. So I don't think this will currently return accurate genomic coordinates.
Because the rotation/zooming complicates things, I have actually also been working on implementing an approach to feature extraction that skips the rotation and shrinking step and allows retrieving genomic coordinates. You can check that out here, if you're interested: https://github.com/liz-is/chess/tree/square_extract This is still undergoing testing and active development, so the interface and features may change. Feedback / bug reports are welcome. N.B., although the main pipeline steps are the same, this approach does not return the exact same features as the current approach, due to the skipping of the rotation and zooming.

cmdoret commented 3 years ago

Hi @liz-is

Thanks for your answer, indeed, I missed the zooming part, and that makes the helper function wrong.

I updated my helper function to have a zoom factor added after rotating coordinates. The rotate_feature function rotates a single (x,y) coordinates relative to a window center. It works as follows:

subtract window (i.e. rotation) center from x and y
use a rotation matrix to rotate the coordinates (angle set to -45, to revert CHESS rotation)
zoom the resulting coordinates (default to 1/0.7 to revert CHESS zoom)
add the center back

The other function, get_feature_coords simply relies on the first one to convert an entire feature (4 coords) and performs the conversion to basepairs.

I used a pair of symmetric features reported by CHESS in real world data to test rotate_feature.

# Coordinates as reported in lost_features.tsv
xmin1, xmax1, ymin1, ymax1 = 187, 192, 88, 95
xmin2, xmax2, ymin2, ymax2 = 187,192,106,113
# Format coords to a matplotlib friendly format
coords_chess = [
    [xmin1, xmin1, xmax1, xmax1, xmin2, xmin2, xmax2, xmax2],
    [ymin1, ymax1, ymin1, ymax1, ymin2, ymax2, ymin2, ymax2],
]

# Rotate coords using my helper function
coords_rotated = list(map(
    lambda c: rotate_feature(c[0], c[1], demo_win.shape[0]),
    zip(*coords_chess)
))
coords_rotated = [
    [c[0] for c in coords_rotated],
    [c[1] for c in coords_rotated],
]

# Region of the matrix from CHESS window
clr = cooler.Cooler('mysample.cool')
demo_win = clr.matrix(sparse=False, balance=True).fetch("chr14:52000001-62000001")
demo_win = np.nan_to_num(demo_win)
# Rotate + zoom matrix using CHESS method
zm1 = clipped_zoom(demo_win, 0.7)
rot1 = ndi.rotate(zm1, 45, reshape=False)

# Visualize original and rotated coordinates with their matrix
fig, ax = plt.subplots(1, 2)
ax[0].imshow(demo_win, vmin=demo_feat.min(), vmax=demo_feat.max())
ax[0].scatter(coords_rotated[0], coords_rotated[1], c='r')
ax[1].imshow(rot1, vmin=demo_feat.min(), vmax=demo_feat.max())
ax[1].scatter(coords_chess[0], coords_chess[1], c='r')
ax[0].set_title('Original window\nManually rotated coords')
ax[1].set_title('CHESS-rotated window\nCHESS coords')

If I zoom to see the coordinates better:

Now if I plot the values of the first features (additional columns of lost_features):

# demo_feat = np.array([many values])
# demo_feat = demo_feat.reshape(xmax1 - xmin1, ymax1 - ymin1)
plt.imshow(demo_feat)

The feature image looks similar to the area circled by red points, not exactly the same but I imagine it may have undergone some processing steps. The details of the image seem to be distorted a bit by the rotation also.

These functions would be handy for downstream analysis of regions from CHESS features, but ultimately I think you're right it would definitely be much better if there was no rotation / zoom at all. Thanks for working on that ! :)

liz-is commented 3 years ago

Hi Cyril, sorry for the slow reply and many thanks for the detailed example! Now I understand much better. Part of my confusion was that I was expecting it to return two sets of coordinates for each feature (i.e. the coordinates of the edges of the bounding box), rather than four (the coordinates of each corner of the bounding box). But either is fine, as long as it's clear what it does.

This looks good to me, and I think with this thread as an example/explanation others would be able to use it. However, the original devs should agree before I can merge this in -- @kaukrise @nickmachnik @sgalan what do you think?

nickmachnik commented 3 years ago

This looks good to me, thanks for submitting this @cmdoret ! I think it would be good if @sgalan took a look and approved this, Ill contact her.

vaquerizaslab / chess

Add coordinate conversion helpers #43