hubmapconsortium / sprm

GNU General Public License v3.0
4 stars 0 forks source link

Cell ID mismatch in polygon list #11

Closed mruffalo closed 3 years ago

mruffalo commented 3 years ago

Fewer cells are listed in cell_polygons_spatial.csv than in other output files. This causes a failure in @ilan-gold's conversion to UI data structures.

ilan-gold commented 3 years ago

There are actually more ids in the cell_centers.csv file which Ted suggested might be another alternate source of ground truth. This isn't my show, but I really think (part of) the solution here needs to be checking all the files that are output both against each other and against the mask to make sure ids match both in quality and quantity.

Screen Shot 2021-06-21 at 2 36 01 PM Screen Shot 2021-06-21 at 12 09 16 PM
tCeZ commented 3 years ago

For the cell center we are actually getting all the cell centers in the mask. This is how the algorithm works and to change this would require some heavy rewriting of the code base or changing of the mask when processing to get the filter list of cells that SPRM processes. I've made the necessary changes to validate that the features files other than the cell_center.csv have the same cell ids.

ilan-gold commented 3 years ago

@TeddyZ95 Who is the cell_center.csv file for? Is there a reason to not just discard it?

tCeZ commented 3 years ago

@ilan-gold This is used to calculate a cell adjacency matrix.

ilan-gold commented 3 years ago

Does the cell adjacency matrix include the extra cells? Wouldn't that make it also incommensurable with the rest of SPRM where the number of cells is fewer?

tCeZ commented 3 years ago

Not necessary since the index of the cell should be identical. As in cell id 11000 should be the same in the cell adj matrix. All that is telling you is what cells are neighboring other cells. Since the issue with skipping cell ID or duplicate cell IDs have been fixed. But I do agree that they should match, but it would require some rewriting in our algorithm. But since you guys don't use the cell_centers I think it's fine to leave as is.

ilan-gold commented 3 years ago

Ok that makes sense. My only concern was that if it was "add an edge between the k-nearest neighbors of the query cell" rather than "all cells within this radius are given an edge to this query cell," then the results would be somewhat off. Doesn't sound like that's the case though!

mruffalo commented 3 years ago

As per @TeddyZ95, fixed in https://github.com/hubmapconsortium/sprm/commit/b4b142dc1e9aad91e549ebf94ee1e6c02d2f9e35.