sugi-cho / CCL-GPU

Connected component labeling (CCL) using Unity ComputeShader
MIT License
25 stars 0 forks source link

Less complicated blob data gathering #1

Open TE-JoakimElvander opened 3 years ago

TE-JoakimElvander commented 3 years ago

Hi,

Thank you for your hard work! Your solution is great and works very well. Admittedly, I had some trouble understanding how and why you collected the blob data the way you did, which also introduced severe limitations on the number of blobs, and worst of all - the size of the blobs that could be counted. Basically, on a modern GPU, the maximum number of member pixels to a blob is limited to 1024 due to the LabelData size and the limitation on group shared data.

So here's my suggestion for counting out LabelData that I have confirmed to work. Unfortunately I have not re-written it to fit your current code and style, so some interpretation by you is needed if you want to use it: 1) a) In the compute shader, add the following pragmas

pragma kernel clearAccumulationBuffers

pragma kernel accumulateRawBlobData

pragma kernel rawBlobDataToLabelData

b) add the following global data //for accumulateRawBlobData StructuredBuffer inAppendedLabels; Texture2D inLabelTexture; RWStructuredBuffer outAccumulatedSize; RWStructuredBuffer outAccumulatedXPos; RWStructuredBuffer outAccumulatedYPos;

// for rawBlobDataToLabelData StructuredBuffer inAccumulatedSize; StructuredBuffer inAccumulatedXPos; StructuredBuffer inAccumulatedYPos; StructuredBuffer inSingleElemCount; // reuse labelDataBuffer

c)
Add the following kernels:

[numthreads(1, 1, 1)] void clearAccumulationBuffers(uint3 id : SV_DispatchThreadID) { outAccumulatedSize[id.x] = 0; outAccumulatedXPos[id.x] = 0; outAccumulatedYPos[id.x] = 0; }

// xy-dim is used to iterate over labelTexture, z-dim to iterate over AppendedLabels [numthreads(8, 8, 8)] void accumulateRawBlobData(uint3 id : SV_DispatchThreadID) {
int pixelLabel = round(inLabelTexture[id.xy]); uint numElems = 0; uint stride = 0; inAppendedLabels.GetDimensions(numElems, stride); if (id.z >= numElems || inAppendedLabels[id.z] != pixelLabel) { return; } // then id.z is at a used Label, and the pixelLabel is the same InterlockedAdd(outAccumulatedSize[id.z], 1); InterlockedAdd(outAccumulatedXPos[id.z], id.x); InterlockedAdd(outAccumulatedYPos[id.z], id.y); // TODO: Distance }

// use max number labels as x-dimension [numthreads(1, 1, 1)] void rawBlobDataToLabelData(uint3 id : SV_DispatchThreadID) { LabelData ld; ld.size = 0; ld.pos = float2(0, 0); ld.distance = 0; uint size = 0; if (id.x < inSingleElemCount[0]) { size = inAccumulatedSize[id.x]; ld.size = size; ld.pos = float2((float)inAccumulatedXPos[id.x] / size, (float)inAccumulatedYPos[id.x] / size); ld.distance = 0; } labelDataBuffer[id.x] = ld; }

2) a) Define the needed ComputeBuffers in CCL.cs ... const string KERNEL_CLEARACCUMBUF = "clearAccumulationBuffers"; const string KERNEL_ACCUMRAW = "accumulateRawBlobData"; const string KERNEL_RAW2LABELDATA = "rawBlobDataToLabelData"; ... const string BUF_INAPPENDED = "inAppendedLabels"; const string BUF_OUTACCSIZE = "outAccumulatedSize"; const string BUF_OUTACCXPOS = "outAccumulatedXPos"; const string BUF_OUTACCYPOS = "outAccumulatedYPos"; const string BUF_INACCSIZE = "inAccumulatedSize"; const string BUF_INACCXPOS = "inAccumulatedXPos"; const string BUF_INACCYPOS = "inAccumulatedYPos"; const string BUF_INELEMCOUNT = "inSingleElemCount"; ... // kernels are initialized as m_kernelClearAccumBuf = m_CCLComputeShader.FindKernel(KERNEL_CLEARACCUMBUF); m_kernelAccumRawData = m_CCLComputeShader.FindKernel(KERNEL_ACCUMRAW); m_kernelRawToLabelData = m_CCLComputeShader.FindKernel(KERNEL_RAW2LABELDATA); ... // buffers are initialized as (for me when camera resolution is known): if (m_accumulatedSize != null) { m_accumulatedSize.Release(); } m_accumulatedSize = new ComputeBuffer(m_width m_height, sizeof(uint)); if (m_accumulatedXPos != null) { m_accumulatedXPos.Release(); } m_accumulatedXPos = new ComputeBuffer(m_width m_height, sizeof(uint)); if (m_accumulatedYPos != null) { m_accumulatedYPos.Release(); } m_accumulatedYPos = new ComputeBuffer(m_width * m_height, sizeof(uint)); ... // and the size buffer as m_singleElemCount = new ComputeBuffer(1, sizeof(uint), ComputeBufferType.Raw);

b) in your CCL.cs, replace the rows 136-139 with something like (my own style may break yours here, so replace as needed)

       ComputeBuffer.CopyCount(m_labelAppendBuffer, m_singleElemCount, 0);

        m_CCLComputeShader.SetBuffer(m_kernelClearAccumBuf, BUF_OUTACCSIZE, m_accumulatedSize);
        m_CCLComputeShader.SetBuffer(m_kernelClearAccumBuf, BUF_OUTACCXPOS, m_accumulatedXPos);
        m_CCLComputeShader.SetBuffer(m_kernelClearAccumBuf, BUF_OUTACCYPOS, m_accumulatedYPos);
        m_CCLComputeShader.Dispatch(m_kernelClearAccumBuf, m_maxNumberLabels, 1, 1);

        m_CCLComputeShader.SetBuffer(m_kernelAccumRawData, BUF_INAPPENDED, m_labelAppendBuffer);
        m_CCLComputeShader.SetTexture(m_kernelAccumRawData, TEX_INLABELS, m_labelTexture);
        m_CCLComputeShader.SetBuffer(m_kernelAccumRawData, BUF_OUTACCSIZE, m_accumulatedSize);
        m_CCLComputeShader.SetBuffer(m_kernelAccumRawData, BUF_OUTACCXPOS, m_accumulatedXPos);
        m_CCLComputeShader.SetBuffer(m_kernelAccumRawData, BUF_OUTACCYPOS, m_accumulatedYPos);
        m_CCLComputeShader.Dispatch(m_kernelAccumRawData, m_width / 8, m_height / 8, m_maxNumberLabels / 8);

        m_CCLComputeShader.SetBuffer(m_kernelRawToLabelData, BUF_INACCSIZE, m_accumulatedSize);
        m_CCLComputeShader.SetBuffer(m_kernelRawToLabelData, BUF_INACCXPOS, m_accumulatedXPos);
        m_CCLComputeShader.SetBuffer(m_kernelRawToLabelData, BUF_INACCYPOS, m_accumulatedYPos);
        m_CCLComputeShader.SetBuffer(m_kernelRawToLabelData, BUF_LABELDATA, m_accumulatedLabelDataBuffer);
        m_CCLComputeShader.SetBuffer(m_kernelRawToLabelData, BUF_INELEMCOUNT, m_singleElemCount);
        m_CCLComputeShader.Dispatch(m_kernelRawToLabelData, m_maxNumberLabels, 1, 1);

I hope this is enough to convey my idea of a simplified transform from the label texture to LabelData structures. It seems to be working, and the limitations on blob size and number of blobs are no longer there to the same extent.

thomaskole commented 1 year ago

I would like to take a look at this method, but I can't get it to work. Could you upload the original scripts?