nolanlab / REDSEA

Other
17 stars 10 forks source link

REDSEA

We present REinforcement Dynamic Spillover EliminAtion (REDSEA) as a solution for spillover compensation without loss of signal. This is the step-by-step guidance of how to use REDSEA to produce a FCS file with compensated channel signals. This FCS file can then be read for your downstream spatial analysis. A detailed manuscript that describes the data and method can be found here.

How it works

In brief, the REDSEA algorithm identifies the boundary region for each cell, based on the segmentation mask. Subsequently, for each channel from a single cell, signals were subtracted based on its shared boundary with neighboring cells, and their corresponding signal in the same channel. Moreover, the removed signal from adjacent cells can be reinforced back to the cell. For detailed information look at the Materials and Methods section of the paper.

Table of content

Example Data

Here we provide example datasets from two different multiplexed imaging modalities, MIBI and CyCIF.

MIBI data

Multiplexed Ion Beam Imaging (MIBI) is a method that uses secondary ion mass spectrometry to image antibodies tagged with isotopically pure elemental metal reporters. Here we will use MIBI data generated from non-human primate lymph nodes as part of the REDSEA manuscript Now Published.

CyCIF data

CyCIF is a method for highly multiplexed immuno-fluorescence imaging of formalin-fixed, paraffin-embedded (FFPE) specimens mounted on glass slides. Here we will use human tonsil data acquired with t-CyCIF method. The acquisition of this data is described in this paper (Rashid, Rumana, et al, Scientific Data 2019)

Required Inputs

MATLAB scripts

The MATLAB scripts include the REDSEA algorithm, along with other scripts created by Leeat Keren for processing multiplexed images, and a fcs processing script writeFCS created by Jakub Nedbal (https://www.mathworks.com/matlabcentral/fileexchange/42603-writefcs-fname-data-text-other).

tiff images and channel information

There should be a CSV file containing the channel name information.

And also a folder containing .tiff images for each channel, where the names should be same as in the CSV file.

Segmentation Mask

This method requires you to provide a cell segmentation mask. A cell nuclei probability matrix can be produced by your own chose (popular options include ilastik or deepcell). In our case for the MIBI data we implemented a trained-in-house deepcell CNN model from DeepCell.

DeepCell is also easy to implement on different imaging modalities. Here is a prediction model we trained with ~ 1500 cells in the CyCIF dataset:

In the example data folder, we have provided a nucleus prediction matrix for the MIBI and CyCIF data, with the name feature_1_frame_1_p1_.tif, in the deepCell sub folder.

After producing a nuclei probablity mask, we will then use the script MibiSegmentByDeepProbWithPerim3.m to implement a watershed algorithm for whole cell segmentation. This will produce something like this:

See MibiSegmentByDeepProbWithPerim3.m Script

```MATLAB %% Nuclear segmentation using pixel probabilities from deepCell % Modified from original script from Leeat Keren % Changed the pipeline to use the probablities from the deepcell instead of % the raw nuclear intensities. % 25July2020, by Yunhao Bai and Sizun Jiang % Main path for the all the data mainPath = 'sampleData_MIBI'; %for MIBI %mainPath = 'sampleData_cycIF'; %for CyCIF resultsPath = [mainPath,'/segmentResults/']; % for the next step, put the % output segmentationParams.mat and PureSegmentation.tif to the % originalTiff/Point? folder for p=1:1 for segmentThres=0.01 % change from 0.01 to 0.1, segmentThres defines the % local maximum, higher segmentThres leads to fewer cells and more merged % regions for probNucThres=0.35 % change from 0.05 to 0.5, probNucThres defines how % board the cells will expand pointNumber=p;disp(['point',num2str(p)]); % read .tiff image of nucleus marker to matrix for image pathNucleusMarker = [mainPath,'/originalTIFF/Point',num2str(p),'/dsDNA.tiff']; %for MIBI %pathNucleusMarker = [mainPath,'/originalTIFF/Point',num2str(p),'/x7500y3500_1700_DAPI.tif']; %for CyCIF t = Tiff(pathNucleusMarker,'r'); nucIm = read(t); %the max value of nucleus channel .tiff maxv=25; %for MIBI %maxv=3000; %for CyCIF rgb_image = MibiGetRGBimageFromMat(nucIm,maxv); % %% Get maxima from deepCell probabilities % read possibility map from deepCell/ilastik/other segmentation methods probNuc = double(imread([mainPath,'/deepCell/feature_1_frame_1_p',num2str(p),'_dsDNA.tif'])); %for MIBI %probNuc = double(imread([mainPath,'/deepCell/feature_1_frame_1_p',num2str(p),'_DAPI.tif'])); %for CyCIF figure;imagesc(probNuc); % find local maxima in probability map maxs = imextendedmax(probNuc,segmentThres); % change from 0.1 to 0.02 rgb_image_perim_extMax = imoverlay(rgb_image , maxs, [1 0 0]); figure; imagesc(rgb_image_perim_extMax); %% watershed over the deep results bw1 = zeros(size(probNuc)); bw1(probNuc>probNucThres) = 1; %change from 0.05 to 0.5 figure; imagesc(bw1); bw2 = bwareaopen(bw1,40); SE = strel('disk',4); bw=imdilate(bw2,SE); [B,L] = bwboundaries(maxs,4,'noholes'); maxsFix = bw & maxs; % modify the image so that the background pixels and the extended maxima % pixels are forced to be the only local minima in the image. Jc = imcomplement(probNuc); I_mod = imimposemin(Jc, ~bw | maxsFix); L = watershed(I_mod); labeledImage = label2rgb(L); cellPerimNewMod= L; cellPerimNewMod(L>0) = 100; cellPerimNewMod(cellPerimNewMod==0)=1; cellPerimNewMod(cellPerimNewMod==100)=0; rgb_image_cellPerim = imoverlay(rgb_image, cellPerimNewMod, [1 0 0]); figure; imagesc(rgb_image_cellPerim); %% 1. For each label, decide whether it is of a nucleus/ background t=40; labelNum = length(unique(L(:))); labelIdentity = zeros (labelNum,1); labelPixelsPercentInNucleiMask = zeros(labelNum,1); labelSize = zeros(labelNum,1); %spend a lot of time during this loop for i=1:labelNum [r c] = find(L==i); labelSize(i) = length(r); labelMask = (L==i); labelPixelsNumInNucleiMask = sum(bw(labelMask)); labelPixelsPercentInNucleiMask(i) = labelPixelsNumInNucleiMask / labelSize(i); if (labelPixelsPercentInNucleiMask(i) > 0.7) labelIdentity(i) = 1; end end % 2. Merge small regions within the nuclei mask with their neighbours keepVec = ones(labelNum,1); newL = L; for i=1:labelNum if (labelIdentity(i) == 1) && (labelSize(i) < t) disp(['Removing label ',num2str(i),'. Size: ',num2str(labelSize(i))]); % get neighbour with largest border that is also in nuclear region [neighbourLabels , neighbouringRegionSize] = MibiGetNeighbourLabels (newL, i); found = 0; [neighbouringRegionSizeS , neighbouringRegionSizeSInd] = sort(neighbouringRegionSize,'descend'); neighbourLabelsS = neighbourLabels(neighbouringRegionSizeSInd); maxInd = 1; while ~found mergeLabelId = neighbourLabelsS(maxInd); if (~(mergeLabelId == 0) && (labelIdentity(mergeLabelId) == 1)) found = 1; else maxInd = maxInd+1; end if (maxInd >length(neighbourLabelsS)) % reached end of neighbours with no good merging candidate disp (['Warning: no good merging target found for label', num2str(i), '. Keeping it.']); break; end end % update if (maxInd <= length(neighbourLabelsS)) [newL] = MibiMergeLabels (newL, i, mergeLabelId); keepVec(i) = 0; end end end % Update label numbers to account for deleted labels allLabels = [1:labelNum]; currLabels = allLabels(keepVec == 1); labelIdentityNew = zeros(length(currLabels),1); newLmod = newL; for i = 1:length(currLabels) newLmod(newLmod == currLabels(i)) = i; labelIdentityNew(i) = labelIdentity(currLabels(i)); end cellPerimNewMod = bwperim(newLmod); cellPerimNewMod = newLmod; cellPerimNewMod(newL>0) = 100; cellPerimNewMod(cellPerimNewMod==0)=1; cellPerimNewMod(cellPerimNewMod==100)=0; rgb_image_cellPerimNewMod = imoverlay(rgb_image, cellPerimNewMod, [1 0 0]); % add ilastik segmentation % ilsegParams = load([path,'/SegmentPerim/Point',num2str(pointNumber),'/segmentationParams.mat']); % rgb_image_cellPerimNewModIl = imoverlay(rgb_image, ilsegParams.cellPerimNewMod, [0 0 0]); % rgb_image_cellPerimNewModBoth = imoverlay(rgb_image_cellPerimNewModIl, cellPerimNewMod, [1 0 0]); figure; imagesc(rgb_image_cellPerimNewMod); mkdir([resultsPath,'/Point',num2str(pointNumber),'_',num2str(segmentThres),'_',num2str(probNucThres)]); imwrite(rgb_image_cellPerimNewMod,[resultsPath,'/Point',num2str(pointNumber),'_',num2str(segmentThres),'_',num2str(probNucThres),'/compareIlastikDeep.tif'],'tif'); imwrite(cellPerimNewMod,[resultsPath,'/Point',num2str(pointNumber),'_',num2str(segmentThres),'_',num2str(probNucThres),'/PureSegmentation.tif'],'tif'); save([resultsPath,'/Point',num2str(pointNumber),'_',num2str(segmentThres),'_',num2str(probNucThres),'/segmentationParams.mat'],'newLmod','cellPerimNewMod','labelIdentityNew'); end end end disp('Please put the PureSegmentation.tif and segmentationParams.mat to the corresponding originalTiff folder for next step.'); ```

This process produced a file called segmentationParams.mat and stored in a subfolder. This file will be used automatically by the REDSEA algorithm.

REDSEA

Parameters

With the .tiff images, .mat segmentation file and .csv channel information, we are now ready to implement REDSEA for boundary compensation.

There are two methods for boundary compensation in REDSEA: Sudoku and Cross. The algorithm walks through the boundaries of each cell, and decides the area to extract signal. You would need to choose one of the two methods and decide how many pixels to expand from the boundary pixel:

The pixel number for expansion should be proportional to the cell size: in our MIBI data, the average cell size is 107 pixels, and we used 2 pixels for expansion; In the CyCIF tonsil dataset the average cell size is 325 pixels, so 3-4 pixels is recommended.

Also, you need to supply a list of channel names to perform the compensation process: normally you should only compensate for the surface markers, like in our case: 'CD4';'CD56';'CD21 (CR2)';'CD163';'CD68';'CD3';'CD20';'CD8a'

Take a look at the annotated code in the block under:

See MibiExtractSingleCellDataFromSegmentationAndTiff_REDSEA.m Script

```MATLAB %% Extract single cell data and do the REDSEA compensation % Based on the original script from Leeat Keren % This now reads in any folder of TIFFs with individual channels from the % same field of view, and uses that to recreate a countsNoNoise based on % the massDS order. % Then performs REDSEA compensation as implemented by Yunhao Bai % The outputs will be REDSEA compensated and non-compensated FCS files % 4May2020, Yunhao Bai, Sizun Jiang % Main path for the all the data mainPath = 'sampleData_MIBI'; %for MIBI %mainPath = 'sampleData_cycIF'; %for CyCIF % This is a csv file for your channel labels within massDS = dataset('File',[mainPath,'/sampleData.csv'],'Delimiter',','); % This assumes the path points to a folder containing all the Points from % the run. Your segmentationParams.mat from each point should be in the % each Point's folder pathTiff = [mainPath,'/originalTiff']; % This is where the FCS file output will go to pathResults = [mainPath,'/FCS_output']; % Select the channels that are expected to be expressed. Cells with minimal % expression of at least one of these channels will be removed clusterChannels = massDS.Label; % exclude elemental channels (here the % sample dataset does not contain [~, clusterChannelsInds] = ismember(clusterChannels,massDS.Label); % boundaryMod determines the type of compensation done. % 1:whole cell compensation % 2:boundary compensation (default) boundaryMod = 2; % REDSEAChecker determines the type of compensation done. % 0:only subtraction; % 1:subtraction and reinforcement (default) REDSEAChecker = 1; % for boundary compensation, needs to specify elementShape. % 1:Sudoku style, 2:Cross style elementShape = 2; % elementSize. How many pixels around the center to be considered for the % elementShape, can be selected from 1-4. % As a default, keep elementShape and elementSize as 2. elementSize = 2; % Select channels for REDSEA compensation. Surface markers are recommended % boundary compensation codes % selected channels to do the boundary compensation normChannels = {'CD4';'CD56';'CD21 (CR2)';'CD163';'CD68';'CD3';'CD20';'CD8a'}; %for MIBI %normChannels = {'x7500y3500_1700_DAPI';'x7500y3500_1700_CD3';'x7500y3500_1700_CD4';'x7500y3500_1700_CD8a';'x7500y3500_1700_CD11b';'x7500y3500_1700_CD20';'x7500y3500_1700_CD45';'x7500y3500_1700_CD68'}; %for CyCIF [~, normChannelsInds] = ismember(normChannels,massDS.Label); channelNormIdentity = zeros(length(massDS.Label),1); % Getting an array of flags for whether to compensate or not for i = 1:length(normChannelsInds) channelNormIdentity(normChannelsInds(i)) = 1; end % Whether what to plot scatter to check the REDSEA result and effect, % default=0 for not, 1 for plotting. % Note that if multiple channels selected (in normChannels), to plot out all % the sanity plots need long time. plotSanityPlots = 0; %% mkdir(pathResults); for p=1:1 disp(['point',num2str(p)]); pointNumber = p; % load tiffs to recreate countsNoNoise for i=1:length(massDS.Label) t = imread([pathTiff, '/Point', num2str(pointNumber), '/', massDS.Label{i}, '.tiff']); %mind the .tif and .tiff d = double(t); % imshow(d) countsNoNoise(:,:,i) = d; end % load segmentation file load([pathTiff, '/Point', num2str(pointNumber), '/segmentationParams.mat']); labelNum = max(max(newLmod)); channelNum = length(massDS); stats = regionprops(newLmod,'Area','PixelIdxList'); % Stats on cell size. Region props is DF with cell location by count countsReshape= reshape(countsNoNoise,size(countsNoNoise,1)*size(countsNoNoise,2),channelNum); % make a data matrix the size of the number of labels x the number of markers % Include one more marker for cell size data = zeros(labelNum,channelNum); dataScaleSize = zeros(labelNum,channelNum); cellSizes = zeros(labelNum,1); % for each label extract information for i=1:labelNum currData = countsReshape(stats(i).PixelIdxList,:); data(i,1:channelNum) = sum(currData,1); dataScaleSize(i,1:channelNum) = sum(currData,1) / stats(i).Area; cellSizes(i) = stats(i).Area; end %% do cell boundary compensation if boundaryMod == 1 dataCompen = MIBIboundary_compensation_wholeCellSA(newLmod,data,channelNormIdentity,REDSEAChecker); elseif boundaryMod == 2 dataCompen = MIBIboundary_compensation_boundarySA(newLmod,data,countsNoNoise,channelNormIdentity,elementShape,elementSize,REDSEAChecker); end dataCompenScaleSize = dataCompen./repmat(cellSizes,[1 channelNum]); % %% Add point number % pointnum = double(repmat(p, 1, length(data))); % pointnum = repelem(p, length(data),[1]); % data = [data, pointnum]; % dataScaleSize = [dataScaleSize, pointnum]; % dataCompen = [dataCompen, pointnum]; % dataCompenScaleSize = [dataCompenScaleSize, pointnum]; %% % get the final information only for the labels with % 1.positive nuclear identity (cells) % 2.that have enough information in the clustering channels to be % clustered labelIdentityNew2 = labelIdentityNew([1:end-1]); % fix bug resulting from previous script sumDataScaleSizeInClusterChannels = sum(dataScaleSize(:,clusterChannelsInds),2); labelIdentityNew2(sumDataScaleSizeInClusterChannels<0.1) = 2; dataCells = data(labelIdentityNew2==1,:); dataScaleSizeCells = dataScaleSize(labelIdentityNew2==1,:); dataCompenCells = dataCompen(labelIdentityNew2==1,:); dataCompenScaleSizeCells = dataCompenScaleSize(labelIdentityNew2==1,:); labelVec=find(labelIdentityNew2==1); % get cell sizes only for cells cellSizesVec = cellSizes(labelIdentityNew2==1); dataL = [labelVec,cellSizesVec,dataCells,repmat(p,[length(labelVec) 1])]; dataScaleSizeL = [labelVec,cellSizesVec,dataScaleSizeCells,repmat(p,[length(labelVec) 1])]; dataCompenL = [labelVec,cellSizesVec,dataCompenCells,repmat(p,[length(labelVec) 1])]; dataCompenScaleSizeL = [labelVec,cellSizesVec,dataCompenScaleSizeCells,repmat(p,[length(labelVec) 1])]; % dataTransStdL = [labelVec,dataCellsTransStd]; % dataScaleSizeTransStdL = [labelVec,dataScaleSizeCellsTransStd]; % dataStdL = [labelVec,dataCellsStd]; % dataScaleSizeStdL = [labelVec,dataScaleSizeCellsStd]; % channelLabelsForFCS = ['cellLabelInImage';'cellSize';massDS.Label]; channelLabelsForFCS = ['cellLabelInImage';'cellSize';massDS.Label;'PointNum']; %% output outputPath = [pathResults,'/Point',num2str(pointNumber),'/BM=',num2str(boundaryMod),'_RC=',num2str(REDSEAChecker),'_Shape=',num2str(elementShape),'_Size=',num2str(elementSize)]; mkdir(outputPath); % plot sanity scatter images if plotSanityPlots == 1 pathSanityPlots = [outputPath,'/sanityPlots/']; mkdir(pathSanityPlots); MIBIboundary_compensation_plotting(dataScaleSizeCells,dataCompenScaleSizeCells,normChannels,normChannelsInds,pathSanityPlots); end % save fcs TEXT.PnS = channelLabelsForFCS; TEXT.PnN = channelLabelsForFCS; save([outputPath,'/cellData.mat'],'labelIdentityNew2','labelVec','cellSizesVec','dataCells','dataScaleSizeCells','dataCompenCells','dataCompenScaleSizeCells','channelLabelsForFCS'); writeFCS([outputPath,'/dataFCS.fcs'],dataL,TEXT); writeFCS([outputPath,'/dataScaleSizeFCS.fcs'],dataScaleSizeL,TEXT); writeFCS([outputPath,'/dataRedSeaFCS.fcs'],dataCompenL,TEXT); writeFCS([outputPath,'/dataRedSeaScaleSizeFCS.fcs'],dataCompenScaleSizeL,TEXT); % writeFCS([outputPath,'/dataTransFCS.fcs'],dataTransL,TEXT); % writeFCS([outputPath,'/dataScaleSizeTransFCS.fcs'],dataScaleSizeTransL,TEXT); % writeFCS([outputPath,'/dataStdFCS.fcs'],dataStdL,TEXT); % writeFCS([outputPath,'/dataScaleSizeStdFCS.fcs'],dataScaleSizeStdL,TEXT); % writeFCS([outputPath,'/dataTransStdFCS.fcs'],dataTransStdL,TEXT); % writeFCS([outputPath,'/dataScaleSizeTransStdFCS.fcs'],dataScaleSizeTransStdL,TEXT); end ```

Sanity plots

To visually inspect if the parameters described in the previous section are optimal for your data, we provided with a flag in the script plotSanityPlots = 1. Once set to 1, it will produce the pairwise combination scatter plots of all the compensated channels. User can use these plots to evaluate if the compensation is optimal (for example looking at CD4-CD8, CD3-CD20 etc.) It is suggested to run this first with less channels, find the optimal parameters, then run the full list.

Here we showed a sainity plot of CD3-CD20 for the CyCIF data:

The x and y axis are raw intensity/counts of the channels, scaled by each individual cell's size. In the plot we can see a reduction of CD3-CD20 double positive cells (less orange dots than blue dots in the middle of the plot), and at the same time retention of single-positive signals.

The sanity plots will be produced in the sanityPlots sub-folder, along with the output of fcs files.

Output

REDSEA will produce the 4 FCS files for downstream analysis:

dataFCS.fcs = raw counts for each single cell without compensation

dataRedSeaFCS = raw counts for each single cell with REDSEA compensation

dataScaleSizeFCS.fcs = counts for each single cell as scaled by cell size (counts/cellSize) without compensation

dataRedSeaScaleSizeFCS.fcs = counts for each single cell as scaled by cell size (counts/cellSize) with REDSEA compensation

It is recommended to use the dataRedSeaScaleSizeFCS.fcs file. We can see by using REDSEA compensation, the boundary signal spillover is dynamically eliminated and reinforced.

For the MIBI dataset:

And REDSEA also works comparably on the CyCIF data: