hubmapconsortium / portal-containers

Docker containers to pre-process data for visualization in the portal
MIT License
0 stars 1 forks source link

Handle cytokit csv #14

Open mccalluc opened 4 years ago

mccalluc commented 4 years ago

Jesus Penaloza gave us a sample CSV with this header:

region_index,tile_index,tile_x,tile_y,rid,rx,ry,id,x,y,z,cm:circularity,cm:diameter,cm:diameter_vx,cm:perimeter,cm:size,cm:size_vx,cm:solidity,nm:circularity,nm:diameter,nm:diameter_vx,nm:perimeter,nm:size,nm:size_vx,nm:solidity,cg:n_neighbors,cg:neighbor_ids,cg:adj_neighbor_pct,cg:adj_bg_pct,cb:on_border,nb:on_border,ci:DAPI-002:mean,ci:CD31:mean,ci:CD8:mean,ci:CD45:mean,ci:DAPI-003:mean,ci:CD20:mean,ci:Ki67:mean,ci:CD3e:mean,ci:DAPI-004:mean,ci:Actin:mean,ci:Podoplanin:mean,ci:CD68:mean,ci:DAPI-005:mean,ci:PanCK:mean,ci:CD21:mean,ci:CD4:mean,ci:DAPI-006:mean,ci:EMPTY:mean,ci:CD45RO:mean,ci:CD11c:mean,ci:DAPI-007:mean,ci:EMPTY:mean.1,ci:E_CAD:mean,ci:CD107a:mean,ci:DAPI-008:mean,ci:EMPTY:mean.2,ci:CD44:mean,ci:HistoneH3:mean,ni:DAPI-002:mean,ni:CD31:mean,ni:CD8:mean,ni:CD45:mean,ni:DAPI-003:mean,ni:CD20:mean,ni:Ki67:mean,ni:CD3e:mean,ni:DAPI-004:mean,ni:Actin:mean,ni:Podoplanin:mean,ni:CD68:mean,ni:DAPI-005:mean,ni:PanCK:mean,ni:CD21:mean,ni:CD4:mean,ni:DAPI-006:mean,ni:EMPTY:mean,ni:CD45RO:mean,ni:CD11c:mean,ni:DAPI-007:mean,ni:EMPTY:mean.1,ni:E_CAD:mean,ni:CD107a:mean,ni:DAPI-008:mean,ni:EMPTY:mean.2,ni:CD44:mean,ni:HistoneH3:mean

... and then it has 120K rows of data.

We want to make this into cells.json/arrow. Other questions:

Q: > For the first columns in the file, like “cm:diameter_vx” or “nm:solidity” is there documentation about the meaning of these fields? A (Maria Keays): > Cytokit’s documentation is a bit thin currently, I think I’ve seen something somewhere about what these abbreviations mean so will try and dig that out … if memory servers “cm” means “cell morphology” and “ni” means “nucleus intensity” Q: > To clarify the division of responsibilities, your pipelines would not give us the polygon, but instead we would need to compute it, given centroids and the segmentation mask? A (Jesus Penaloza): > Yes. Q: > We’ve seen something like this before, and weren’t sure what the “EMPTY”s meant. Or would you be happy if we just presented the headers the same downsteam? A: > Empty headers and black should be disregard downstream. This files are use for background subtraction, but is not need for analysis Q: > Is the neighbor information something you’d like us to do something with? A: > It will be great to see this since it could help further classify each single cell not only by marker but also by proximity Q: >Each row is one cell, right? A: >Yes sir
mccalluc commented 4 years ago

I think this was asked in a separate channel which I can't find right now, but can the CSV which my code receives be given a more distinctive name? Perhaps cytokit.csv, or something even more descriptive, so we can distinguish it from other CSVs that might be in the directory.

mccalluc commented 4 years ago

I learned on the call this morning that this is not the correct file. Waiting to be pointed at the correct one.