labsyspharm / quantification

Quantification module for mcmicro
https://github.com/labsyspharm/mcmicro
9 stars 13 forks source link

x and y coordinates are mixed up in the output #18

Closed ArtemSokolov closed 4 years ago

ArtemSokolov commented 4 years ago

Users report that the x and y columns in the output are mixed up (i.e., x coordinates get placed in the y column and vice versa).

The issue arises because numpy arrays are indexed by (row, column), which is effectively (y,x). https://github.com/labsyspharm/quantification/blob/311b2347adfce6474d122e3a8108553464247efc/SingleCellDataExtraction.py#L53-L54 As pointed out by Jeremy, xcoords should be receiving centroid[1], and ycoords should get centroid[0].

DenisSch commented 4 years ago

@ArtemSokolov Once I change this, all downstream analysis tools created for the current format will have to be adapted!

ArtemSokolov commented 4 years ago

Can you give an example where this would be problematic? This is effectively a mirror around the x=y diagonal, which I don't think should affect many analyses, since it doesn't perturb the relative distances.

DenisSch commented 4 years ago

@ArtemSokolov I am still concerned: e.g. we are using X/Y for ROI alignment between OMERO and MCMICRO. Additionally all visualization scripts need to be adapted. It's a minor change with huge issues for people who are currently processing larger data sets and are half way through (e.g. PCA)

ArtemSokolov commented 4 years ago

@DenisSch I'm confused, do all those tools expect flipped X and Y coordinates?

In other words, if segmentation identifies that there is a cell at x=5, y=120 and quantification flips that to x=120, y=5, do all the visualizers then flip it back to x=5, y=120?

ArtemSokolov commented 4 years ago

Or do the visualizers assume that the x-axis runs from the top down and the y-axis goes left to right?

DenisSch commented 4 years ago

The downstream analysis just uses the x/y coordinates as currently outputted by MCMICRO and aligns it with any other visualization type. OMERO / MINERVA etc.

DenisSch commented 4 years ago

We can also do a pole and see who would be affected.

DenisSch commented 4 years ago

@JoshuaHess12 @hywu0110 Will this break anything in your current workflow. Are you okay with the flip of x/y coordinates?

JoshuaHess12 commented 4 years ago

@DenisSch I have no problems with flipping to the correct coordinates.

hungyiwu commented 4 years ago

I'll have to change almost all of my scripts if this change happens. I believe it's a terminology thing. Scikit-image outputs (row, col)

https://scikit-image.org/docs/dev/user_guide/numpy_images.html#coordinate-conventions

The use case where this matters a lot is when users try to overlay images and plots. Ultimately we just want to be clear to the users about what that xcoords means: the x-axis in Cartesian coordinate, or the first dimension in the Numpy array coordinate.

hungyiwu commented 4 years ago

To be clear, I don't mind changing my code. I'd recommend using the same convention as Scikit-image for interoperability and portability: replace the terms xcoords and ycoords with rows and cols.

ArtemSokolov commented 4 years ago

What if we have both?

@DenisSch : Consider renaming the current columns to row and col, and then creating two new columns x and y, such that y == row and x == col?

Make the new column names such that they cannot be confused with existing ones (e.g., call the new column x if it's currently xcoord, or vice versa). This will purposefully trigger "missing column" errors in downstream scripts, causing developers to explicitly consider whether they want to use row/col or x/y system.

DenisSch commented 4 years ago

So include row/col and x/y into the csv?!

ArtemSokolov commented 4 years ago

That's my thought. You would essentially duplicate row into y, and col into x. But then you would have two unambiguous systems that downstream scripts can use as desired.

hungyiwu commented 4 years ago

I was thinking about just row/col and let the user do the rest, that way reducing redundancy.