Closed zgabi closed 1 year ago
Source of data,"data " of where ,how about share your Utils.ExtractPixels totally code? thank's
data is the bitmapData.Scan0
Span<byte> data;
unsafe
{
data = new Span<byte>((void*)bitmapData.Scan0, bitmapData.Height * bitmapData.Stride);
}
and spanR、spanG、spanB how thrans to DenseTensor?
This should be the full ExactPixels method:
public static Tensor<float> ExtractPixels2(Bitmap bitmap)
{
var rectangle = new Rectangle(0, 0, bitmap.Width, bitmap.Height);
BitmapData bitmapData = bitmap.LockBits(rectangle, ImageLockMode.ReadOnly, PixelFormat.Format32bppPArgb);
var tensor = new DenseTensor<float>(new[] { 1, 3, bitmap.Height, bitmap.Width });
Span<byte> data;
unsafe
{
data = new Span<byte>((void*)bitmapData.Scan0, bitmapData.Height * bitmapData.Stride);
}
int pixelCount = bitmap.Width * bitmap.Height;
var spanR = tensor.Buffer.Span;
var spanG = spanR.Slice(pixelCount);
var spanB = spanG.Slice(pixelCount);
int sidx = 0;
int didx = 0;
for (int i = 0; i < pixelCount; i++)
{
spanR[didx] = data[sidx + 2] / 255.0F;
spanG[didx] = data[sidx + 1] / 255.0F;
spanB[didx] = data[sidx] / 255.0F;
didx++;
sidx += 4;
}
bitmap.UnlockBits(bitmapData);
return tensor;
}
Tensor is just an N dimensional array.
In your case it is 4 dimensional: new DenseTensor<float>(new[] { 1, 3, bitmap.Height, bitmap.Width });
Where the 1st dimension has only 1 value, the 2nd has 3 (R, G, B), the 3rd is the height and the 4th is the width.
So internally it is only a float[1 3 width * height] array.
So in the memory it contains RRRRRRR......(count: width height) GGGGGG......(count: width height) BBBBBBB......(count: width * height) values (where R, G, B is a float)
This should be the full ExactPixels method:
public static Tensor<float> ExtractPixels2(Bitmap bitmap) { var rectangle = new Rectangle(0, 0, bitmap.Width, bitmap.Height); BitmapData bitmapData = bitmap.LockBits(rectangle, ImageLockMode.ReadOnly, PixelFormat.Format32bppPArgb); var tensor = new DenseTensor<float>(new[] { 1, 3, bitmap.Height, bitmap.Width }); Span<byte> data; unsafe { data = new Span<byte>((void*)bitmapData.Scan0, bitmapData.Height * bitmapData.Stride); } int pixelCount = bitmap.Width * bitmap.Height; var spanR = tensor.Buffer.Span; var spanG = spanR.Slice(pixelCount); var spanB = spanG.Slice(pixelCount); int sidx = 0; int didx = 0; for (int i = 0; i < pixelCount; i++) { spanR[didx] = data[sidx + 2] / 255.0F; spanG[didx] = data[sidx + 1] / 255.0F; spanB[didx] = data[sidx] / 255.0F; didx++; sidx += 4; } bitmap.UnlockBits(bitmapData); return tensor; }
Tensor is just an N dimensional array. In your case it is 4 dimensional:
new DenseTensor<float>(new[] { 1, 3, bitmap.Height, bitmap.Width });
Where the 1st dimension has only 1 value, the 2nd has 3 (R, G, B), the 3rd is the height and the 4th is the width. So internally it is only a float[1 3 width * height] array.So in the memory it contains RRRRRRR......(count: width height) GGGGGG......(count: width height) BBBBBBB......(count: width * height) values (where R, G, B is a float)
Good job. However, compared to not using numpy, the performance is still a bit worse, but I still like your modification, and I will update it to the project immediately.
I love you. I was searching for a fix for this for HOURS!
Utils.ExtractPixels
is very slow. On my machine it is 300-500ms. Nested Parallel processing is unnecessary, it makes the function only slower. If I remove the Parallel loops, the result is 70ms.. which is still quite a lot. (Tensor indexer is very slow, use the tensor.Buffer)In your code you already assume that the bitmapData is ARGB, 4 byte per pixel, so using the Stride is unnecerrasy, since (from the documetation):
And in this case the width of a single row is always multiple of 4.
I rewrote the function, this is only 3ms and not an "unsafe" code: int pixelCount = width * height; var spanR = tensor.Buffer.Span; var spanG = spanR.Slice(pixelCount); var spanB = spanG.Slice(pixelCount);
Maybe you can make it even faster by using unsafe code.
This is just an idea how you could make it faster. If you expect higher models in the future (like 8K * 8K), you can keep the outer parallel loop or make "my" single loop parallel... but a nested parallel loop is overkill.... And for 640x640 pixels the parallel loop is unnecessary.