kekyo / FlashCap

Independent video frame capture library on .NET/.NET Core and .NET Framework.
Apache License 2.0
191 stars 28 forks source link
capture csharp directshow directshow-camera dotnet frame-grabber fsharp image independent v4l2 vfw video video-for-windows

FlashCap

FlashCap

FlashCap - Independent video frame capture library.

Project Status: Active – The project has reached a stable, usable state and is being actively developed.

NuGet

Package NuGet
FlashCap NuGet FlashCap
FSharp.FlashCap NuGet FSharp.FlashCap

Japanese language

What is this?

Do you need to get video frame capturing ability on .NET? Is you tired for video frame capturing library solutions on .NET?

This is a video frame image capture library by specializing only capturing image data (a.k.a frame grabber). It has simple API, easy to use, simple architecture and without native libraries. It also does not depend on any non-official libraries. See NuGet dependencies page.


Short sample code

Install the FlashCap NuGet package.

Enumerate target devices and video characteristics:

using FlashCap;

// Capture device enumeration:
var devices = new CaptureDevices();

foreach (var descriptor in devices.EnumerateDescriptors())
{
    // "Logicool Webcam C930e: DirectShow device, Characteristics=34"
    // "Default: VideoForWindows default, Characteristics=1"
    Console.WriteLine(descriptor);

    foreach (var characteristics in descriptor.Characteristics)
    {
        // "1920x1080 [JPEG, 30fps]"
        // "640x480 [YUYV, 60fps]"
        Console.WriteLine(characteristics);
    }
}

Then, capture it:

// Open a device with a video characteristics:
var descriptor0 = devices.EnumerateDescriptors().ElementAt(0);

using var device = await descriptor0.OpenAsync(
    descriptor0.Characteristics[0],
    async bufferScope =>
    {
        // Captured into a pixel buffer from an argument.

        // Get image data (Maybe DIB/JPEG/PNG):
        byte[] image = bufferScope.Buffer.ExtractImage();

        // Anything use of it...
        var ms = new MemoryStream(image);
        var bitmap = Bitmap.FromStream(ms);

        // ...
    });

// Start processing:
await device.StartAsync();

// ...

// Stop processing:
await device.StopAsync();

You can also use the Reactive Extension:

// Get a observable with a video characteristics:
var descriptor0 = devices.EnumerateDescriptors().ElementAt(0);

using var deviceObservable = await descriptor0.AsObservableAsync(
    descriptor0.Characteristics[0]);

// Subscribe the device.
deviceObservable.Subscribe(bufferScope =>
{
    // Captured into a pixel buffer from an argument.

    // Get image data (Maybe DIB/JPEG/PNG):
    byte[] image = bufferScope.Buffer.ExtractImage();

    // Anything use of it...
    var ms = new MemoryStream(image);
    var bitmap = Bitmap.FromStream(ms);

    // ...
});

// Start processing:
await deviceObservable.StartAsync();

As you can see, FlashCap does not depend on any GUI elements. For example, FlashCap can be applied to a console application.

Published introduction article: "Easy to implement video image capture with FlashCap" (dev.to)


Target environments

.NET platforms supported are as follows (almost all!):

Platforms on which capture devices can be used:

Tested devices

Run the sample code to verify in 0.11.0.

Verified capture devices / cameras:

Verified computers:

Couldn't detect any devices on FlashCap:


Fully sample code

Fully sample code is here:

This is an Avalonia sample application on both Windows and Linux. It is performed realtime usermode capturing, decoding bitmap (from MJPEG) and render to window. Avalonia is using renderer with Skia (SkiaImageView). It is pretty fast.

FlashCap.Avalonia

FlashCap.Avalonia

Want to take just one image

If you want to take only one image, there is a very simple method:

// Take only one image, given the image characteristics:
var descriptor0 = devices.EnumerateDescriptors().ElementAt(0);

byte[] imageData = await descriptor0.TakeOneShotAsync(
    descriptor0.Characteristics[0]);

// Save to file
await File.WriteAllBytesAsync("oneshot", imageData);

See sample code for a complete implementation.

Exclusion of unsupported formats

The video characteristics contain a list of formats supported by the camera. FlashCap does not support all formats, so you must select the correct format before opening the device. Unsupported formats are indicated by PixelFormats.Unknown.

// Select a device:
var descriptor0 = devices.EnumerateDescriptors().ElementAt(0);

// Exclude unsupported formats:
var characteristics = descriptor0.Characteristics.
    Where(c => c.PixelFormat ! = PixelFormats.Unknown).
    ToArray();

FlashCap enumerates all formats returned by the device. Therefore, by checking the information in VideoCharacteristics with PixelFormats.Unknown, you can analyze what formats the device supports.

Displaying camera device property page

It is possible to display camera device property page.

PropertyPage

using var device = await descriptor.OpenAsync(
    characteristics,
    async bufferScope =>
    {
        // ...
    });

// if the camera device supports property pages
if (device.HasPropertyPage)
{
    // Get parent window handle from Avalonia window
    if (this.window.TryGetPlatformHandle()?.Handle is { } handle)
    {
        // show the camera device's property page
        await device.ShowPropertyPageAsync(handle);
    }
}

Currently, property pages can only be displayed when the target is a DirectShow device.

See Avalonia sample code and WPF sample code for a complete implementation.


Implementation Guidelines

In the following sections, we will explain various techniques for processing large amounts of image data using FlashCap. This is an application example, so it is not necessary to read it, but it will give you some hints for implementation.

Reduce data copy

Processing video requires handling large amounts of data; in FlashCap, each piece of video is called "a frame." The frames come and go at a rate of 60 or 30 times per second.

The key here is how to process the data in each frame without copying it. Currently, FlashCap requires at least one copy. However, depending on how it is used, two, three, or even more copies may occur.

The callback when calling the OpenAsync method will pass a PixelBufferScope argument. This argument contains the data of the frame that was copied once. Now let's call the CopyImage() method, which is the "safest" method:

using var device = await descriptor0.OpenAsync(
  descriptor0.Characteristics[0],
  async bufferScope =>  // <-- `PixelBufferScope` (already copied once at this point)
  {
    // This is where the second copy occurs.
    byte[] image = bufferScope.Buffer.CopyImage();

    // Convert to Stream.
    var ms = new MemoryStream(image);
    // Consequently, a third copy occurs here.
    var bitmap = Bitmap.FromStream(ms);

    // ...
  });

This would result in at least two copies in total. Furthermore, by decoding the resulting image data (image) with Bitmap.FromStream(), three copies will have occurred as a result.

Now, what about the first code example, using ExtractImage()?

using var device = await descriptor0.OpenAsync(
  descriptor0.Characteristics[0],
  async bufferScope =>  // <-- `PixelBufferScope` (already copied once at this point)
  {
    // This is where the second copy (may) occur.
    byte[] image = bufferScope.Buffer.ExtractImage();

    // Convert to Stream.
    var ms = new MemoryStream(image);
    // Decode. Consequently, a second or third copy occurs here.
    var bitmap = Bitmap.FromStream(ms);

    // ...
  });

When I say "copying (may) occur," I mean that under some circumstances, copying may not occur. If so, you may think that you should use only ExtractImage() instead of CopyImage(). However, ExtractImage() has a problem that the validity period of obtained data is short.

Consider the following code:

// Stores image data outside the scope of the callback.
byte[]? image = null;

using var device = await descriptor0.OpenAsync(
  descriptor0.Characteristics[0],
  bufferScope =>  // <-- `PixelBufferScope` (already copied once at this point)
  {
    // Save out of scope. (second copy)
    image = bufferScope.Buffer.CopyImage();
    //image = bufferScope.ExtractImage();  // DANGER!!!
  });

// Use outside of scope.
var ms = new MemoryStream(image);
// Decode (Third copy)
var bitmap = Bitmap.FromStream(ms);

Thus, if the image is copied with CopyImage(), it can be safely referenced outside the scope of the callback. However, if you use ExtractImage(), you must be careful because the image data may be corrupted if you reference it outside the scope.

Similarly, using the ReferImage() method, basically no copying occurs. (Except when transcoding occurs. See below.) Again, out-of-scope references cannot be made. Also, the image data is not stored in a byte array, but in a ArraySegment<byte> is used to refer to the image data.

This type cannot be used as is because it represents a partial reference to an array. For example, if you want to use it as a Stream, use the following:

using var device = await descriptor0.OpenAsync(
  descriptor0.Characteristics[0],
  async bufferScope =>  // <-- `PixelBufferScope` (already copied once at this point)
  {
    // Basically no copying occurs here.
    ArraySegment<byte> image =
      bufferScope.Buffer.ReferImage();

    // Convert to Stream.
    var ms = new MemoryStream(
      image.Array, image.Offset, image.Count);
    // Decode (Second copy)
    var bitmap = Bitmap.LoadFrom(ms);

    // ...
  });

If you use MemoryStream, you may use the extension method AsStream(), which is defined similar to this code example. Also, if you use SkiaSharp, you can pass ArraySegment<byte> directly using SKBitmap.Decode().

The following is a list of methods for acquiring image data described up to this point:

Method Speed Out of scope Image type
CopyImage() Slow Safe byte[]
ExtractImage() Slow in some cases Danger byte[]
ReferImage() Fast Danger ArraySegment<byte>

I found that using ReferImage(), I can achieve this with at least two copies. So how can we shorten it to once?

To achieve this with only one copy, the decoding of the image data must be given up. Perhaps, if the environment allows hardware to process the image data, the second copy could be offloaded by passing the image data directly to the hardware.

As an easy-to-understand example, consider the following operation, which saves image data directly to a file. In this case, since no decoding is performed, it means that the copying is done once. (Instead, the I/O operation is tremendously slow...)

using var device = await descriptor0.OpenAsync(
  descriptor0.Characteristics[0],
  async bufferScope =>  // <-- `PixelBufferScope` (already copied once at this point)
  {
    // Basically no copying occurs here.
    ArraySegment<byte> image = bufferScope.Buffer.ReferImage();

    // Output image data directly to a file.
    using var fs = File.Create(
      descriptor0.Characteristics[0].PixelFormat switch
      {
        PixelFormats.JPEG => "output.jpg",
        PixelFormats.PNG => "output.png",
        _ => "output.bmp",
      });
    await fs.WriteAsync(image.Array, image.Offset, image.Count);
    await fs.FlushAsync();
  });

About transcoder

The "raw image data" obtained from a device may not be a JPEG or RGB DIB bitmap, which we can easily handle. Typically, video format is called "MJPEG" (Motion JPEG) or "YUV" if it is not a continuous stream such as MPEG.

"MJPEG" is completely the same as JPEG, so FlashCap returns the image data as is. In contrast, the "YUV" format has the same data header format as a DIB bitmap, but the contents are completely different. Therefore, many image decoders will not be able to process it if it is saved as is in a file such as "output.bmp".

Therefore, FlashCap automatically converts "YUV" format image data into RGB DIB format. This process is called "transcoding." Earlier, I explained that ReferImage() "basically no copying occurs here," but in the case of "YUV" format, transcoding occurs, so a kind of copying is performed. (FlashCap handles transcoding in multi-threaded, but even so, large image data can affect performance.)

If the image data is "YUV" and you do not have any problem, you can disable transcoding so that the copying process is completely one time only:

// Open device with transcoding disabled:
using var device = await descriptor0.OpenAsync(
  descriptor0.Characteristics[0],
  TranscodeFormats.DoNotTranscode,   // Do not transcode.
  async buferScope =>
  {
      // ...
  });

// ...

The TranscodeFormats enumeration value has the following choices:

TranscodeFormats Details
Auto Transcode if necessary and automatically select a transformation matrix. Depending on the resolution, BT601, BT709, or BT2020 will be selected.
DoNotTranscode No transcoding at all; formats other than JPEG or PNG will be stored in the DIB bitmap as raw data.
BT601 If necessary, transcode using the BT.601 conversion matrix. This is standard for resolutions up to HD.
BT709 If necessary, transcode using the BT.709 conversion matrix. This is standard for resolutions up to FullHD.
BT2020 If necessary, transcode using the BT.2020 conversion matrix. This is standard for resolutions beyond FullHD, such as 4K.

In addition to the above, there are BT601FullRange, BT709FullRange, and BT2020FullRange. These extend the assumed range of the luminance signal to the entire 8-bit range, but are less common. If Auto is selected, these FullRange matrices are not used.

Callback handler and invoke trigger

The callback handlers described so far assume that the trigger to be called is "when a frame is obtained," but this trigger can be selected from several patterns. This choice can be made with the isScattering and maxQueuingFrames arguments, or with the overloaded argument of OpenAsync:

// Specifies the trigger for invoking the handler:
using var device = await descriptor0.OpenAsync(
  descriptor0.Characteristics[0],
  TranscodeFormats.Auto,
  true,   // Specifying the invoking trigger (true: Scattering)
  10,     // Maximum number of queuing frames
  async buferScope =>
  {
      // ...
  });

// ...

The following is a list of pattern types:

isScattering maxQueuingFrames Summary
false 1 When argument is omitted (default). Discard all subsequent frames unless the handler returns control. Suitable for general usage.
false n Subsequent frames are stored in the queue even if the handler does not return control. If computer performance is sufficient, up to the maximum number of frames will not be lost.
true n Handlers are processed in parallel by a multithreaded worker. Although the order of corresponding frames is not guaranteed, processing can be accelerated if the CPU supports multiple cores.

The default call trigger is appropriate for many cases. For example, if an image is previewed in the UI and an excessive value is specified for the number of frames to stay, if the handler is slow, the queue will hold old image data and the current pose and preview will diverge rapidly. Also, at some point the process will be forced to terminate due to lack of memory.

Similarly, isScattering == true is more difficult to master. Your handler will be called and processed in a multi-threaded environment at the same time. Therefore, at the very least, your handler should be implemented to be thread-safe. Also, being called in a multi-threaded fashion means that the buffers to be processed may not necessarily maintain their order. For example, when displaying a preview in the UI, the video should momentarily go back in time or feel choppy.

To deal with the fact that isScattering == true can cause the order of frames to be lost, the PixelBuffer class defines the Timestamp and FrameIndex properties. By referring to these properties, you can determine the frame order.

Reactive extension issue

By the way, have you noticed that there are overloads for both PixelBufferArrivedDelegate and PixelBufferArrivedTaskDelegate in the handler argument of OpenAsync()? This is because they correspond to the synchronous and asynchronous versions of the handler implementation, respectively, and both correctly recognize the completion of handler processing.

However, in the case of AsObservableAsync(), the handler implementation corresponds to the Reactive Extension's OnNext() method, which only exists in the synchronous version. In other words, if you use the Reactive Extension, you cannot use asynchronous processing for the observer implementation. You can mark with async void for async void OnNext(...), but be very careful that the pixel buffer expires just before the first await. The compiler cannot detect this problem.

The safest course of action would be to extract (copy) the image data from the pixel buffer as quickly as possible. This is easily accomplished using the projection operator:

deviceObservable.
    // Immediately projection
    Select(bufferScope =>
        Bitmap.FromStream(bufferScope.Buffer.ReferImage().AsStream())).
    // Do whatever you want after that...
    // ...

Customize buffer pooling (Advanced topic)

FlashCap has a buffer pooling interface for reused buffers. It is implemented by the BufferPool base class, which extends this class.

The default implementation is the DefaultBufferPool class, which is used automatically. This class is a simple implementation, but uses weak references to allow the GC to reclaim buffers that are no longer in use.

If you want to replace buffer pooling with your own implementation, implement the following two abstract methods:

// Base class for buffer pooling.
public abstract class BufferPool
{
  protected BufferPool()
  { /* ... */ }

  // Get the buffer.
  public abstract byte[] Rent(int minimumSize);

  // Release the buffer.
  public abstract void Return(byte[] buffer);
}

.NET has GC, the simplest (and non-pooling) implementation would be:

public sealed class FakeBufferPool : BufferPool
{
    public override byte[] Rent(int minimumSize) =>
        // Always generate a buffer.
        new byte[minimumSize];

    public override void Return(byte[] buffer)
    {
        // (Unfollow the `buffer` reference and let the GC collect it.)
    }
}

For example, some of you may know that the .NET Core version System.Buffers has an ArrayPool class. By extending BufferPool, you can use such an existing buffer pooling implementation or your own implementation.

If you implement your own class in this way, pass it to the constructor of CaptureDevices for FlashCap to use:

// Create and use a buffer pooling instance.
var bufferPool = new FakeBufferPool();

var devices = new CaptureDevices(bufferPool);

// ...

It is used as a common buffer pooling for all devices enumerated from this instance.

Master for frame processor (Advanced topic)

Welcome to the underground dungeon, where FlashCap's frame processor is a polished gem. But you don't need to understand frame processors unless you have a lot of experience with them. This explanation should be used as a reference when dealing with unavoidable frame processors. Also helpful would be the default implementation of the frame processor that FlashCap includes.

The callback handler invocation triggers described in the previous section are internally realized by switching frame processors. In other words, it is an abstraction of how frames are handled and their behavior.

The frame processor is implemented by inheriting a very simple base class:

// (Will spare you the detailed definitions.)
public abstract class FrameProcessor : IDisposable
{
  // Implement if necessary.
  public virtual void Dispose()
  {
  }

  // Get a pixel buffer.
  protected PixelBuffer GetPixelBuffer()
  { /* ... */ }

  // Return the pixel buffer.
  public void ReleasePixelBuffer(PixelBuffer buffer)
  { /* ... */ }

  // Perform capture using the device.
  protected void Capture(
    CaptureDevice captureDevice,
    IntPtr pData, int size,
    long timestampMicroseconds, long frameIndex,
    PixelBuffer buffer)
  { /* ... */ }

  // Called when a frame is arrived.
  public abstract void OnFrameArrived(
    CaptureDevice captureDevice,
    IntPtr pData, int size, long timestampMicroseconds, long frameIndex);
}

At the very least, you need to implement the OnFrameArrived() method. This is literally called when a frame is arrived. As you can see from the signature, it is passed a raw pointer, the size of the image data, a timestamp, and a frame number.

Note also that the return value is void. This method cannot be asynchronous. Even if you qualify it with async void, the information passed as arguments cannot be maintained.

Here is a typical implementation of this method:

public sealed class CoolFrameProcessor : FrameProcessor
{
  private readonly Action<PixelBuffer> action;

  // Hold a delegate to run once captured.
  public CoolFrameProcessor(Action<PixelBuffer> action) =>
    this.action = action;

  // Called when a frame is arrived.
  public override void OnFrameArrived(
    CaptureDevice captureDevice,
    IntPtr pData, int size, long timestampMicroseconds, long frameIndex)
  {
    // Get a pixel buffer.
    var buffer = base.GetPixelBuffer();

    // Perform capture.
    // Image data is stored in pixel buffer. (First copy occurs.)
    base.Capture(
      captureDevice,
      pData, size,
      timestampMicroseconds, frameIndex,
      buffer);

    // Invoke a delegate.
    this.action(buffer);

    // Return the pixel buffer (optional, will reuse allocated buffer)
    base.ReleasePixelBuffer(buffer);
  }
}

Recall that this method is called each time a frame is arrived. In other words, this example implementation creates a pixel buffer, captures it, and invoke the delegate every time a frame is arrived.

Let's try to use it:

var devices = new CaptureDevices();
var descriptor0 = devices.EnumerateDevices().ElementAt(0);

// Open by specifying our frame processor.
using var device = await descriptor0.OpenWitFrameProcessorAsync(
  descriptor0.Characteristics[0],
  TranscodeFormats.Auto,
  new CoolFrameProcessor(buffer =>   // Using our frame processor.
  {
    // Captured pixel buffer is passed.
    var image = buffer.ReferImage();

    // Perform decode.
    var bitmap = Bitmap.FromStream(image.AsStream());

    // ...
  });

await device.StartAsync();

// ...

Your first frame processor is ready to go. And even if you don't actually run it, you're probably aware of its features and problems:

For this reason, FlashCap uses a standard set of frame processors that can be operated with some degree of safety. So where is the advantage of implementing custom frame processors?

It is possible to implement highly optimized frame and image data processing. For example, pixel buffers are created efficiently, but we do not have to be used. (Calling the Capture() method is optional.) Since a pointer to the raw image data and its size are given by the arguments, it is possible to access the image data directly. So, you can implement your own image data processing to achieve the fastest possible processing.


Limitation


Build FlashCap

FlashCap keeps a clean build environment. Basically, if you have Visual Studio 2022 .NET development environment installed, you can build it as is. (Please add the WPF and Windows Forms options. These are required to build the sample code)

  1. Clone this repository.
  2. Build FlashCap.sln.
    • Build it with dotnet build.
    • Or open FlashCap.sln with Visual Studio 2022 and build it.

NOTE: FlashCap itself should build in a Linux environment, but since the sample code has a Windows-dependent implementation, we assume Windows as the development environment.

Pull requests are welcome! Development is on the develop branch and merged into the main branch at release time. Therefore, if you make a pull request, please make new your topic branch from the develop branch.

Porting V4L2 to unsupported platforms

V4L2 is the Linux image capture standard API. FlashCap supports V4L2 API, which allows it to run on a variety of Linux platforms. The supported platforms are listed below:

The supported platforms listed here are simply those that I and contributors have been able to verify work, successfully captured the camera using FlashCap.

If you ask me if it works on other platforms, such as mips64, riscv32/64, or sparc64, it will not work. The reasons are as follows:

.NET runtime, time may solve the problem. So, if you intend to port FlashCap to an unsupported Linux platform, please refer to the following for a porting overview:

First, you need to build FlashCap.V4L2Generator. When .NET SDK is not available in the target Linux environment, we provide a build-mono.sh that compiles the code using mono mcs compiler.

Then, the rough procedure is shown in the script dumper.sh. Customize the script to suit your target environment.

The source code generated by FlashCap.V4L2Generator is placed into FlashCap.Core/Internal/V4L2/. To use it, in the switch statement of the type initializer in NativeMethods_V4L2.cs, Add a new platform branch.

switch (buf.machine)
{
    case "x86_64":
    case "amd64":
    case "i686":
    case "i586":
    case "i486":
    case "i386":
        Interop = IntPtr.Size == 8 ?
            new NativeMethods_V4L2_Interop_x86_64() :
            new NativeMethods_V4L2_Interop_i686();
        break;
    case "aarch64":
    case "armv9l":
    case "armv8l":
    case "armv7l":
    case "armv6l":
        Interop = IntPtr.Size == 8 ?
            new NativeMethods_V4L2_Interop_aarch64() :
            new NativeMethods_V4L2_Interop_armv7l();
        break;
    case "mips":
    case "mipsel":
        Interop = new NativeMethods_V4L2_Interop_mips();
        break;
    case "loongarch64":
        Interop = new NativeMethods_V4L2_Interop_loongarch64();
        break;

    // (Insert your cool platform ported interop...)

    default:
        throw new InvalidOperationException(
            $"FlashCap: Architecture '{buf.machine}' is not supported.");
}

And pray to God for the rest :) You may want to use the Avalonia sample code to verify this. If your environment does not run Avalonia, after trying with the OneShot sample code, You can also extend this to save a continuous bitmap and check it.

If this is successful, PRs are welcome.

Because the code generated by this process can be said to be nearly identical to the code on other platforms, I have not been able to verify it directly on my stock hardware, but I can probably accept the PR. Please also provide the following information (This will be noted in the documentation):

TIPS: The reason why V4L2Generator is needed is that the various defaults assumed by the .NET interoperability feature are optimized for the Windows environment and are not compatible with the variation for target ABI.


License

Apache-v2.


History