ikwzm / udmabuf

User space mappable dma buffer device driver for Linux.
BSD 2-Clause "Simplified" License
539 stars 165 forks source link

Xilinx Zynqmp: The read data is inconsistent with the data written by the PL #116

Open zhanghongg opened 6 months ago

zhanghongg commented 6 months ago

The problem is sporadic. It seems to always happen at the last position of buf.

ikwzm commented 6 months ago

Please give more information.

zhanghongg commented 6 months ago

U-dma-buf devicetree:

    udmabuf@0x00 {
        compatible = "ikwzm,u-dma-buf";
        device-name = "udmabuf0";
        size = <0x14000000>;
    };

U-dma-buf initialization:

    fd_ = open("/dev/udmabuf0", O_RDWR);
    if (fd_ < 0) {   
        log_error("UdmaBuf mmap!");
        assert(false);
    }
    user_ptr_ = mmap64(NULL, map_size_, PROT_READ | PROT_WRITE, MAP_SHARED, fd_,            base_addr_);
    if(user_ptr_ != MAP_FAILED) {
        log_info("UdmaBuf mmap succuffully!");
        ClearBuf();
    }
    else {
        log_error("UdmaBuf mmap failed!"); 
}

U-dma-buf ClearBuf:

    unsigned char* tmp = reinterpret_cast<unsigned char*>(user_ptr_);
memset(tmp,0,map_size_);

V4l2 Userptr REQBUFS:

    uint64_t reserved_base_addr = 0;
    reserved_memory_ = std::make_shared<UdmaBuf>(reserved_base_addr, req_count *            buf_length_);

    struct v4l2_requestbuffers reqBufs;
    memset(&reqBufs, 0, sizeof(struct v4l2_requestbuffers));
    reqBufs.count  = req_count;
    reqBufs.type   = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    reqBufs.memory = V4L2_MEMORY_USERPTR;
    if (ioctl(fd_, VIDIOC_REQBUFS, &reqBufs)) {
        log_error("v4l2 VIDIOC_REQBUFS failed: %s", strerror(errno));
        return false;
    }
    video_buf_.resize(req_count);
    dq_video_bufs_.resize(req_count);
    for(int i = 0; i < reqBufs.count; i++){
        struct v4l2_buffer videoBuf;
        memset(&videoBuf, 0, sizeof(struct v4l2_buffer));
        videoBuf.index = i;
        videoBuf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
        videoBuf.memory = V4L2_MEMORY_USERPTR;
        if(ioctl(fd_, VIDIOC_QUERYBUF, &videoBuf)){
            log_error("v4l2 VIDIOC_QUERYBUF failed: %s", strerror(errno));
            return false;
        }
        void* myPtr = reserved_memory_->GetMappAddr();
        video_buf_[i] = reinterpret_cast<unsigned char*>(myPtr) + i*buf_length_;
        videoBuf.m.userptr = reinterpret_cast<unsigned long>(video_buf_[i]);

        if(ioctl(fd_, VIDIOC_QBUF, &videoBuf)){
            log_error("v4l2 VIDIOC_QBUF failed: %s", strerror(errno));
            return false;
        }
    }

    return true;

Program running logic and Problem describe:

image

PL will write a value of the last four bytes of buf, and the ps side will write the last four bytes of buf as 0 every time the ps side reads buf. The problem is that when ps obtain buf3,it occasionally reads out the last four byte values of 0, but the pl terminal has written it as a non-0 value.

ikwzm commented 6 months ago

How is Cache Coherency?

zhanghongg commented 6 months ago

The code shown above is everything I do

zhanghongg commented 6 months ago

Please provide some suggestions, thank you.

zhanghongg commented 6 months ago

supplement: PL use axi_hp to write ddr.

I tried using the method that "Manual cache management with the CPU cache still being enabled" in udmabuf readme. The problem will still occur.

May I ask how I should solve this problem?

ikwzm commented 6 months ago

I tried using the method that "Manual cache management with the CPU cache still being enabled" in udmabuf readme. The problem will still occur.

How did you do this?

ikwzm commented 6 months ago

How many is the value of buflength ? What is the value of videoBuf.length ?

ikwzm commented 6 months ago

What I would like to know is not so much when, but when are you syncing?

zhanghongg commented 6 months ago

When CPU using buf, the following code is roughly executed:


// set sync_offset
unsigned char attr[1024];
unsigned long sync_offset = 16777216* index; // index: 0-19
if ((fd = open("/sys/class/u-dma-buf/udmabuf0/sync_offset", O_WRONLY)) != -1) {
sprintf(attr, "%d", sync_offset); /* or sprintf(attr, "0x%x", sync_offset); /
write(fd, attr, strlen(attr));
close(fd);
}
// set sync_size
unsigned char attr[1024];
unsigned long sync_size = 16777216;
if ((fd = open("/sys/class/u-dma-buf/udmabuf0/sync_size", O_WRONLY)) != -1) {
sprintf(attr, "%d", sync_size); / or sprintf(attr, "0x%x", sync_size); */
write(fd, attr, strlen(attr));
close(fd);
}

// set sync_direction
unsigned char attr[1024];
unsigned long sync_direction = 1;
if ((fd = open("/sys/class/u-dma-buf/udmabuf0/sync_direction", O_WRONLY)) != -1) {
sprintf(attr, "%d", sync_direction);
write(fd, attr, strlen(attr));
close(fd);
}
// set sync_for_cpu
unsigned char attr[1024];
unsigned long sync_for_cpu = 1;
if ((fd = open("/sys/class/u-dma-buf/udmabuf0/sync_for_cpu", O_WRONLY)) != -1) {
sprintf(attr, "%d", sync_for_cpu);
write(fd, attr, strlen(attr));
close(fd);
}
// set sync_for_device
unsigned char attr[1024];
unsigned long sync_for_device = 0;
if ((fd = open("/sys/class/u-dma-buf/udmabuf0/sync_for_device", O_WRONLY)) != -1) {
sprintf(attr, "%d", sync_for_device);
write(fd, attr, strlen(attr));
close(fd);
}
ikwzm commented 6 months ago

Umm... I do not know what you are doing. Please show me the source code, not the text.

zhanghongg commented 6 months ago

After obtaining the v4l2 buf, I executed the sync code above

ikwzm commented 6 months ago

After obtaining the v4l2 buf, I executed the sync code above

It does not work at all.

zhanghongg commented 6 months ago

What I would like to know is not so much when, but when are you syncing?

what means?

It does not work at all.

why...

zhanghongg commented 6 months ago

Thank you very much for your answer. The purpose of using udmabuf is to improve the speed of memcpy. What should I do now?

ikwzm commented 6 months ago

I ask again.

How many is the value of buflength ? What is the value of videoBuf.length ?

zhanghongg commented 6 months ago

Both are 16,777,216.

ikwzm commented 6 months ago

PL will write a value of the last four bytes of buf, and the ps side will write the last four bytes of buf as 0 every time the ps side reads buf.

Why does the ps side write the last four bytes of buf as every time the ps side reads buf ? Can you show me the source code for this part?

zhanghongg commented 6 months ago

Thank you very much for your help!

Why does the ps side write the last four bytes of buf as every time the ps side reads buf ?

The next time the PL end receives this buf, it will determine some logic based on this memory address

Can you show me the source code for this part?

OK!

Main logic code:

    unsigned char* buf = v4l2fd_.DqBuf();
    ...
    ... /// use this buf
    ...
    uint64_t* countBuf = reinterpret_cast<uint64_t*>(buf + stripBufSize - sizeof(uint64_t));
    *countBuf =0;
    v4l2fd_.QBuf();
unsigned char* DqBuf() {
    memset(&active_video_buf_, 0, sizeof(struct v4l2_buffer));
    active_video_buf_.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    active_video_buf_.memory = V4L2_MEMORY_USERPTR;
    if(ioctl(fd_, VIDIOC_DQBUF, &active_video_buf_)){
        log_error("v4l2 VIDIOC_DQBUF failed: %s", strerror(errno));
        return 0;
    }
    reserved_memory_->Sync(buf_length_, active_video_buf_.index*buf_length_, 1);
    return video_buf_[active_video_buf_.index];
}
bool V4l2FdUserPtr::QBuf() {
    reserved_memory_->Sync(buf_length_, active_video_buf_.index*buf_length_, 0);
    if(ioctl(fd_, VIDIOC_QBUF, &active_video_buf_)){
        log_error("v4l2 VIDIOC_QBUF failed: %s", strerror(errno));
        return false;
    }
    return true;
}
void UdmaBuf::Sync(unsigned long sync_size, unsigned long sync_offset,
                   unsigned int sync_direction) {
    int fd;
    {
        unsigned char  attr[1024];
        if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_offset", O_WRONLY)) != -1) {
            sprintf((char *)attr, "%d", sync_offset); /* or sprintf(attr, "0x%x", sync_offset); */
            write(fd, attr, strlen((const char *)attr));
            close(fd);
        }
    }
    {
        unsigned char  attr[1024];
        if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_size", O_WRONLY)) != -1) {
            sprintf((char *)attr, "%d", sync_size); /* or sprintf(attr, "0x%x", sync_size); */
            write(fd, attr, strlen((const char *)attr));
            close(fd);
        }
    }

    {
        unsigned char  attr[1024];
        if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_direction", O_WRONLY)) != -1) {
            sprintf((char *)attr, "%d", sync_direction);
            write(fd, attr, strlen((const char *)attr));
            close(fd);
        }
    }

    {
        unsigned char  attr[1024];
        unsigned long  sync_for_cpu = sync_direction == 1? 1:0;
        if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_for_cpu", O_WRONLY)) != -1) {
            sprintf((char *)attr, "%d", sync_for_cpu);
            write(fd, attr, strlen((const char *)attr));
            close(fd);
        }
    }

    {
        unsigned char  attr[1024];
        unsigned long  sync_for_device = sync_direction == 1? 0:1;
        if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_for_device", O_WRONLY)) != -1) {
            sprintf((char *)attr, "%d", sync_for_device);
            write(fd, attr, strlen((const char *)attr));
            close(fd);
        }
    }

}
ikwzm commented 6 months ago

Where is the value of *countBuf read and checked?

zhanghongg commented 5 months ago

Main logic code:

unsigned char* buf = v4l2fd_.DqBuf();
...
... /// use this buf
...
uint64_t* countBuf = reinterpret_cast<uint64_t*>(buf + stripBufSize - sizeof(uint64_t));
*countBuf =0;
v4l2fd_.QBuf();

After DqBuf, Check at this location. " /// use this buf"

ikwzm commented 5 months ago

Thank you. By the way, this is a confirmation, Did it work correctly with V4L2_MEMORY_MMAP instead of V4L2_MEMORY_USERPTR?

zhanghongg commented 5 months ago

Yes, it can work correctly with V4L2_MEMORY_MMAP.

ikwzm commented 5 months ago

What is the device driver?

ikwzm commented 5 months ago

It is better not to use memset() to clear u-dma-bufs. > #38

zhanghongg commented 5 months ago

It is better not to use memset() to clear u-dma-bufs. > https://github.com/ikwzm/udmabuf/issues/38

OK! thank you ! I have read this answer, so I will continue to monitor the impact of memset and consider giving up using memset later on.

What is the device driver?

Are you referring to the v4l2 driver?Is there anything I should pay attention to in this regard?

Also, based on the code provided so far and my purpose, I have two questions.

Firstly, is my usage of udmabuf roughly correct?

Secondly, in order to achieve my goal that improve the speed of memcpy, is my solution direction correct.

Thank you again for your assistance.

ikwzm commented 5 months ago

What is the device driver?

Are you referring to the v4l2 driver?Is there anything I should pay attention to in this regard?

Yes, what v4l2 driver are you using?

Firstly, is my usage of udmabuf roughly correct?

Specify sync_direction as 2 as follows: If the V4L2 driver is capturing, use DMA_FROM_DEVICE(=2).

void UdmaBuf::Sync(unsigned long sync_size, unsigned long sync_offset,
                   unsigned int sync_for_device) {
    int fd;
    {
        unsigned char  attr[1024];
        if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_offset", O_WRONLY)) != -1) {
            sprintf((char *)attr, "%d", sync_offset); /* or sprintf(attr, "0x%x", sync_offset); */
            write(fd, attr, strlen((const char *)attr));
            close(fd);
        }
    }
    {
        unsigned char  attr[1024];
        if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_size", O_WRONLY)) != -1) {
            sprintf((char *)attr, "%d", sync_size); /* or sprintf(attr, "0x%x", sync_size); */
            write(fd, attr, strlen((const char *)attr));
            close(fd);
        }
    }

    {
        unsigned char  attr[1024];
        unsigned int   sync_direction = 2; // DMA_FROM_DEVICE if V4L2_BUF_TYPE_VIDEO_CAPTURE
        if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_direction", O_WRONLY)) != -1) {
            sprintf((char *)attr, "%d", sync_direction);
            write(fd, attr, strlen((const char *)attr));
            close(fd);
        }
    }

    if (sync_for_device == 0) 
    {
        unsigned char  attr[1024];
        unsigned long  sync_for_cpu = 1;
        if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_for_cpu", O_WRONLY)) != -1) {
            sprintf((char *)attr, "%d", sync_for_cpu);
            write(fd, attr, strlen((const char *)attr));
            close(fd);
        }
    }
    else
    {
        unsigned char  attr[1024];
        unsigned long  sync_for_device = 1;
        if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_for_device", O_WRONLY)) != -1) {
            sprintf((char *)attr, "%d", sync_for_device);
            write(fd, attr, strlen((const char *)attr));
            close(fd);
        }
    }

}

Secondly, in order to achieve my goal that improve the speed of memcpy, is my solution direction correct.

I believe the method is correct. However, it may not work with some V4L2 drivers. Therefore, please let me know what you are using for your V4L2 driver.

ikwzm commented 5 months ago

There is another way to speed up mmap. It is to use S_AXI_HPC0 or S_AXI_HPC1 or S_AXI_ACP for the PL -> PS interface to perform cache coherency in hardware and then set the dma-coherent property in the V4L2 driver device tree. By doing so, the cache is also enabled in a way using V4L2_MEMORY_MMAP.

Please refer to the above for details> #107

For more information on cache coherency, please refer to the following URL

https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842098/Zynq+UltraScale+MPSoC+Cache+Coherency

zhanghongg commented 5 months ago

Thank you.

There is another way to speed up mmap. It is to use S_AXI_HPC0 or S_AXI_HPC1 or S_AXI_ACP for the PL -> PS interface to perform cache coherency in hardware and then set the dma-coherent property in the V4L2 driver device tree. By doing so, the cache is also enabled in a way using V4L2_MEMORY_MMAP.

Please refer to the above for details> https://github.com/ikwzm/udmabuf/issues/107

For more information on cache coherency, please refer to the following URL

https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842098/Zynq+UltraScale+MPSoC+Cache+Coherency

OK! I have also considered this method, but the PL segment requires 4 axi_hp buses.

Therefore, please let me know what you are using for your V4L2 driver.

The v4l2 driver has customized some logic to tell the PL end the writable buf address. Should I provide you with the driver source code? If so, I will provide the source code on Monday.

zhanghongg commented 5 months ago

v4l2 driver:

#include <linux/module.h>
#include <linux/delay.h>
#include <linux/errno.h>
#include <linux/fs.h>
#include <linux/kernel.h>
#include <linux/slab.h>
#include <linux/mm.h>
#include <linux/ioport.h>
#include <linux/init.h>
#include <linux/sched.h>
#include <linux/pci.h>
#include <linux/random.h>
#include <linux/version.h>
#include <linux/mutex.h>
#include <linux/videodev2.h>
#include <linux/dma-mapping.h>
#include <linux/interrupt.h>
#include <linux/kthread.h>
#include <linux/highmem.h>
#include <linux/freezer.h>
#include <media/videobuf-vmalloc.h>
#include <media/v4l2-device.h>
#include <media/v4l2-ioctl.h>
#include <linux/platform_device.h>
#include <linux/gpio.h>
#include <linux/of_gpio.h>

#include "extractor_sw.h"
#include "extractor_hw.h"

/* 
 * trans video driver
 */
#define MAX_WIDTH   8192
#define MAX_HEIGHT  2048

#define CAPTURE_DRV_NAME "Strip driver"
#define PVI_MODULE_NAME "Strip"

void __iomem  *extractor_base = 0;
void __iomem  *sta_info = 0;

static struct extractor_port *file2port(struct file *file)
{
    return container_of(file->private_data, struct extractor_port, fh);
}

static void start_extractor(struct extractor_dev *xdev, struct extractor_buffer *buf, unsigned char buf_flag)
{
    dma_addr_t dma_addr;
    dma_addr = vb2_dma_contig_plane_dma_addr(&buf->v4l2_buf.vb2_buf, 0);
    if(buf_flag == 0) {
        iowrite32(dma_addr, extractor_base + XWR2DDR_CTRL_ADDR_DATA_DATA_A);
        iowrite32(dma_addr, extractor_base + XWR2DDR_CTRL_ADDR_DATA_DATA2_A);
    }
    else {
        iowrite32(dma_addr, extractor_base + XWR2DDR_CTRL_ADDR_DATA_DATA_B);
        iowrite32(dma_addr, extractor_base + XWR2DDR_CTRL_ADDR_DATA_DATA2_B);
    }
    //v4l2_info(&xdev->v4l2_dev, "dma_addr=0x%x\n", dma_addr);
}
/*
static void stop_dma(void)
{
}
*/
/*
 * Videobuf operations
 */
static int extractor_queue_setup(struct vb2_queue *vq,
               unsigned int *nbuffers, unsigned int *nplanes,
               unsigned int sizes[], struct device *alloc_devs[])
{
    struct extractor_port *port = vb2_get_drv_priv(vq);

    *nplanes = 1;
    sizes[0] = port->sizeimage;

    return 0;
}

static void extractor_buf_queue(struct vb2_buffer *vb)
{
                // printk("\n%s Line%d\n",__func__,__LINE__);

    struct extractor_port *port = vb2_get_drv_priv(vb->vb2_queue);
    struct extractor_dev *dev = port->dev;
    struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);

    struct extractor_buffer *buf = container_of(vbuf, struct extractor_buffer, v4l2_buf);

    unsigned long flags;

    spin_lock_irqsave(&dev->slock, flags);
    list_add_tail(&buf->list, &port->vidq);
    spin_unlock_irqrestore(&dev->slock, flags);
}

static int extractor_start_streaming(struct vb2_queue *vq, unsigned int count)
{
    struct extractor_port *port = vb2_get_drv_priv(vq);
    struct extractor_dev *dev = port->dev;

    struct extractor_buffer *buf;
    unsigned long flags;

    port->sequence = 0;

    //buf a
    buf = list_entry(port->vidq.next, struct extractor_buffer, list);
    buf->allow_dq = true;
    spin_lock_irqsave(&dev->slock, flags);
    list_del(&buf->list);
    list_add_tail(&buf->dq_list, &dev->extractor_bufs_a);
    spin_unlock_irqrestore(&dev->slock, flags);
    start_extractor(dev, buf, 0);

    //buf b
    buf = list_entry(port->vidq.next, struct extractor_buffer, list);
    buf->allow_dq = true;
    spin_lock_irqsave(&dev->slock, flags);
    list_del(&buf->list);
    list_add_tail(&buf->dq_list, &dev->extractor_bufs_b);
    spin_unlock_irqrestore(&dev->slock, flags);
    start_extractor(dev, buf, 1);

    vq->streaming = 1;
    //gpio_set_value(dev->gpio_sensor_en, 1);

    return 0;
}

/*
 * Abort streaming and wait for last buffer
 */
static void extractor_stop_streaming(struct vb2_queue *vq)
{
    struct extractor_port *port = vb2_get_drv_priv(vq);
    struct extractor_dev *dev = port->dev;  

    struct extractor_buffer *buf;

    /* release all active buffers */
    while (!list_empty(&dev->extractor_bufs_a)) {
        buf = list_entry(dev->extractor_bufs_a.next,
                struct extractor_buffer, dq_list);
        list_del(&buf->dq_list);
        vb2_buffer_done(&buf->v4l2_buf.vb2_buf, VB2_BUF_STATE_ERROR);
    }
    while (!list_empty(&dev->extractor_bufs_b)) {
        buf = list_entry(dev->extractor_bufs_b.next,
                struct extractor_buffer, dq_list);
        list_del(&buf->dq_list);
        vb2_buffer_done(&buf->v4l2_buf.vb2_buf, VB2_BUF_STATE_ERROR);
    }
    while (!list_empty(&port->vidq)) {
        buf = list_entry(port->vidq.next, struct extractor_buffer, list);
        list_del(&buf->list);
        vb2_buffer_done(&buf->v4l2_buf.vb2_buf, VB2_BUF_STATE_ERROR);
    }

    vq->streaming = 0;
}   

static struct vb2_ops extractor_video_qops = {
    .queue_setup        = extractor_queue_setup,
    .buf_queue          = extractor_buf_queue,
    .start_streaming    = extractor_start_streaming,
    .stop_streaming     = extractor_stop_streaming,
    .wait_prepare       = vb2_ops_wait_prepare,
    .wait_finish        = vb2_ops_wait_finish,
};

static int extractor_open(struct file *file)
{
    struct extractor_port *port = video_drvdata(file);
    struct extractor_dev *dev = port->dev;

    if (!dev->setup_done) {
        dev->setup_done = 1;
    }   

    port->width = MAX_WIDTH;
    port->height = MAX_HEIGHT;

    port->bytesperline = port->width;
    port->sizeimage = port->width * port->height;   

    v4l2_fh_init(&port->fh, video_devdata(file));
    file->private_data = &port->fh;
    v4l2_fh_add(&port->fh);
    port->open = 1;

    return 0;
}

static int extractor_release(struct file *file)
{
    struct extractor_port *port = video_drvdata(file);
    struct vb2_queue *q = &port->vb_vidq;

    extractor_stop_streaming(q);

    if (file->private_data) {
        v4l2_fh_del((struct v4l2_fh *)file->private_data);
        v4l2_fh_exit((struct v4l2_fh *)file->private_data);
    }

    vb2_queue_release(q);

    port->open = 0;

    return 0;
}

static const struct v4l2_file_operations extractor_fops = {
    .owner = THIS_MODULE,
    .open = extractor_open,
    .release = extractor_release,
    .unlocked_ioctl = video_ioctl2,
    .mmap = vb2_fop_mmap,
    .poll = vb2_fop_poll,
};

static int extractor_querycap(struct file *file, void *priv,
            struct v4l2_capability *cap)
{
    strncpy(cap->driver, CAPTURE_DRV_NAME, sizeof(cap->driver) - 1);
    strncpy(cap->card, PVI_MODULE_NAME, sizeof(cap->card) - 1);
    strlcpy(cap->bus_info, PVI_MODULE_NAME, sizeof(cap->bus_info));
    cap->device_caps  = V4L2_CAP_STREAMING | V4L2_CAP_VIDEO_CAPTURE;
    cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS;

    return 0;
}

static int extractor_enum_fmt_vid_cap(struct file *file, void *priv,
                struct v4l2_fmtdesc *fmt)
{
    fmt->type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    strcpy(fmt->description, "Raw Mode-Y8");
    fmt->pixelformat = V4L2_PIX_FMT_GREY;

    return 0;
}

static int extractor_g_fmt_vid_cap(struct file *file, void *priv,
                 struct v4l2_format *f)
{
    struct extractor_port *port = file2port(file);

    f->fmt.pix.width    = port->width;
    f->fmt.pix.height   = port->height;
    f->fmt.pix.bytesperline = port->bytesperline;
    f->fmt.pix.sizeimage    = port->sizeimage;

    return 0;
}

static int extractor_try_fmt_vid_cap(struct file *file, void *fh, struct v4l2_format *format)
{
    return 0;
}

static int extractor_s_fmt_vid_cap(struct file *file, void *priv,
                 struct v4l2_format *f)
{
    struct extractor_port *port = file2port(file);

    f->fmt.pix.bytesperline = f->fmt.pix.width;
    f->fmt.pix.sizeimage = f->fmt.pix.width * f->fmt.pix.height;

    port->width = f->fmt.pix.width;
    port->height = f->fmt.pix.height;
    port->bytesperline = f->fmt.pix.bytesperline;
    port->sizeimage = f->fmt.pix.sizeimage;

    /* do something else */

    return 0;
}

static long extractor_ioctl_default(struct file *file, void *fh, bool valid_prio,
                  unsigned int cmd, void *arg)
{
    switch (cmd) {
    default:
        return -ENOTTY;
    }
}

static const struct v4l2_ioctl_ops extractor_ioctl_ops = {
    .vidioc_querycap            = extractor_querycap,
    .vidioc_enum_fmt_vid_cap    = extractor_enum_fmt_vid_cap,

    .vidioc_g_fmt_vid_cap   = extractor_g_fmt_vid_cap,
    .vidioc_try_fmt_vid_cap =extractor_try_fmt_vid_cap,
    .vidioc_s_fmt_vid_cap   = extractor_s_fmt_vid_cap,

    .vidioc_reqbufs     = vb2_ioctl_reqbufs,
    .vidioc_create_bufs = vb2_ioctl_create_bufs,
    .vidioc_prepare_buf = vb2_ioctl_prepare_buf,
    .vidioc_querybuf    = vb2_ioctl_querybuf,
    .vidioc_qbuf        = vb2_ioctl_qbuf,
    .vidioc_dqbuf       = vb2_ioctl_dqbuf,

    .vidioc_streamon    = vb2_ioctl_streamon,
    .vidioc_streamoff   = vb2_ioctl_streamoff,
    .vidioc_log_status  = v4l2_ctrl_log_status,
    .vidioc_default     = extractor_ioctl_default,
};

static int alloc_port(struct extractor_dev *xdev)
{
    struct extractor_port *port;
    struct vb2_queue *q;
    struct video_device *vfd;
    int ret;

    port = kzalloc (sizeof(*port), GFP_KERNEL);
    if (!port)
        return -ENOMEM;

    port->dev = xdev;
    port->open = 0;

    /*
     * Initialize queue
     */
    q = &port->vb_vidq;
    q->type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    q->io_modes = VB2_MMAP | VB2_DMABUF |VB2_USERPTR;
    q->drv_priv = port;
    q->buf_struct_size = sizeof(struct extractor_buffer);
    q->ops = &extractor_video_qops;
    q->mem_ops = &vb2_dma_contig_memops;
    q->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
    q->lock = &xdev->mutex;
    q->dev = &(xdev->pdev->dev);
    ret = vb2_queue_init(q);
    if (ret)
        goto do_free_port;

    INIT_LIST_HEAD(&port->vidq);

    vfd = video_device_alloc();
    if (!vfd)
    {
        ret = -ENOMEM;
        goto do_free_port;
    }

    vfd->device_caps = V4L2_CAP_STREAMING | V4L2_CAP_VIDEO_CAPTURE;
    vfd->v4l2_dev = &xdev->v4l2_dev;
    vfd->queue = q;
    vfd->fops       = &extractor_fops,
    vfd->ioctl_ops  = &extractor_ioctl_ops,
    vfd->minor      = -1,
    vfd->release    = video_device_release,
    vfd->lock = &xdev->mutex;
    snprintf(vfd->name, sizeof(vfd->name), "%s", PVI_MODULE_NAME);
    video_set_drvdata(vfd, port);

    ret = video_register_device(vfd, VFL_TYPE_VIDEO, -1);  //VFL_TYPE_GRABBER
    if (ret) {
        v4l2_err(&xdev->v4l2_dev, "Failed to register video device\n");
        goto do_free_port;
    }

    port->vfd = vfd;
    xdev->port = port;

    v4l2_info(&xdev->v4l2_dev, "Device registered as /dev/video%d\n", vfd->num);
    return 0;

do_free_port:
    kfree(port);
    return ret;
}

static void free_port(struct extractor_port *port)
{
    if (!port)
        return;
    v4l2_info(&(port->dev->v4l2_dev), PVI_MODULE_NAME
            " Device /dev/video%d is removed\n", port->vfd->num);
    video_unregister_device(port->vfd);
    video_device_release(port->vfd);

    kfree(port);
}

static void extractor_active_buf_next(struct extractor_port *port, unsigned char buf_flag)
{
    struct extractor_dev *dev = port->dev;
    struct extractor_buffer *buf;
    unsigned long flags;

    spin_lock_irqsave(&dev->slock, flags);

    if (!list_empty(&port->vidq)) {
        // printk("\n%s Line%d\n",__func__,__LINE__);
        buf = list_first_entry(&port->vidq, struct extractor_buffer, list);
        if(list_is_last(&buf->list, &port->vidq ))
        {
            buf->allow_dq = false;
            // printk("\n%s Line%d\n",__func__,__LINE__);
        }
        else
        {
            buf->allow_dq = true;
        }
        list_del(&buf->list);
        if(buf_flag == 0) {
            list_add_tail(&buf->dq_list, &dev->extractor_bufs_a);
        }
        else {
            list_add_tail(&buf->dq_list, &dev->extractor_bufs_b);
        }

        start_extractor(dev, buf, buf_flag);
    }
    else
    {
    }

    spin_unlock_irqrestore(&dev->slock, flags);
}

static void extractor_process_buffer_complete(struct extractor_port *port, unsigned char buf_flag)
{
    struct extractor_dev *dev = port->dev;
    struct vb2_buffer *vb = NULL;
    struct extractor_buffer *buf;
    struct vb2_v4l2_buffer *v4l2_buf =NULL;
    unsigned long flags;
    if(buf_flag == 0) {
        if (list_empty(&dev->extractor_bufs_a))
            return;
        buf = list_first_entry(&dev->extractor_bufs_a, struct extractor_buffer, dq_list);
    }
    else {
        if (list_empty(&dev->extractor_bufs_b))
            return;
        buf = list_first_entry(&dev->extractor_bufs_b, struct extractor_buffer, dq_list);
    }

    if (buf) {
        v4l2_buf =  &buf->v4l2_buf;
        vb = &v4l2_buf->vb2_buf;
        v4l2_buf->sequence = *(u32 *)sta_info;
        spin_lock_irqsave(&dev->slock, flags);
        list_del(&buf->dq_list);
        spin_unlock_irqrestore(&dev->slock, flags);

        if (buf->allow_dq) 
        {
#if 0
            dma_addr_t dma_addr;        
            dma_addr = vb2_dma_contig_plane_dma_addr(vb, 0);
            dma_sync_single_for_cpu(&(dev->pdev->dev), dma_addr,vb->planes[0].length, DMA_FROM_DEVICE);
#endif          
            vb2_buffer_done(vb, VB2_BUF_STATE_DONE);
            buf->allow_dq = false;
        }
        else
        {
            spin_lock_irqsave(&dev->slock, flags);
            list_add_tail(&buf->list, &port->vidq);
            spin_unlock_irqrestore(&dev->slock, flags);
        }
    } 
    else
    {
        printk("%s:%s\n",__func__,"BUG().");
        BUG();
    }

    port->sequence++;
}

static irqreturn_t extractor_irq_a(int irq, void *data) 
{
    struct extractor_dev *dev = (struct extractor_dev *)data;
    struct extractor_port *port = dev->port;

    extractor_process_buffer_complete(port, 0);
    extractor_active_buf_next(port, 0); 

    return IRQ_HANDLED;
}

static irqreturn_t extractor_irq_b(int irq, void *data) 
{
    struct extractor_dev *dev = (struct extractor_dev *)data;
    struct extractor_port *port = dev->port;

    extractor_process_buffer_complete(port, 1);
    extractor_active_buf_next(port, 1); 

    return IRQ_HANDLED;
}

static int extractor_probe(struct platform_device *pdev)
{
    struct extractor_dev *vdev;
    struct resource *res;

    int ret = 0;

    //struct device_node *dev_node = pdev->dev.of_node;

    vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
    if (!vdev)
        return -ENOMEM;

    /* extractor��ַ�ռ� */
    res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
    if (res == NULL) {
        dev_err(&pdev->dev, "Missing platform resources data\n");
        ret = -ENODEV;
        goto free_dev;
    }

    if (!devm_request_mem_region(&pdev->dev, res->start, resource_size(res), pdev->name))
    {
        dev_err(&pdev->dev, "Failed to request  memory resources\n");
        ret = -ENOMEM;
        goto free_dev;
    }

    extractor_base = devm_ioremap(&pdev->dev, res->start,  resource_size(res));  //devm_ioremap_nocache
    if (!extractor_base){
        ret = -ENOMEM;
        goto free_dev;
    }

    sta_info = devm_ioremap(&pdev->dev, STA_INFO_BASE,  12);  //devm_ioremap_nocache
    if (!sta_info){
        ret = -ENOMEM;
        goto free_dev;
    }

    vdev->irq_a = platform_get_irq(pdev, 0);
    if (!vdev->irq_a) {
        dev_err(&pdev->dev, "Could not get extractor irq a");
        goto free_dev;
    }

    vdev->irq_b = platform_get_irq(pdev, 1);
    if (!vdev->irq_b) {
        dev_err(&pdev->dev, "Could not get extractor irq b");
        goto free_dev;
    }

    if (devm_request_irq(&pdev->dev, vdev->irq_a, extractor_irq_a,
                   IRQF_TRIGGER_RISING, PVI_MODULE_NAME, vdev) < 0) {
        ret = -ENOMEM;
        goto free_dev;
    }   

    if (devm_request_irq(&pdev->dev, vdev->irq_b, extractor_irq_b,
                   IRQF_TRIGGER_RISING, PVI_MODULE_NAME, vdev) < 0) {
        ret = -ENOMEM;
        goto free_dev;
    }   

    spin_lock_init(&vdev->slock);
    INIT_LIST_HEAD(&vdev->extractor_bufs_a);
    INIT_LIST_HEAD(&vdev->extractor_bufs_b);

    ret = v4l2_device_register(&pdev->dev, &vdev->v4l2_dev);
    if (ret)
        goto free_irq;

    mutex_init(&vdev->mutex);

    vdev->pdev = pdev;
    vdev->setup_done = 0;

    ret = alloc_port(vdev);
    if (ret)
        goto free_irq;

    platform_set_drvdata(pdev, vdev);

    return 0;

free_irq:
    free_irq(vdev->irq_a, vdev);        
    free_irq(vdev->irq_b, vdev);        

free_dev:
    kfree(vdev);

    return ret;
}

static int extractor_remove(struct platform_device *pdev)
{
    struct extractor_dev *dev = platform_get_drvdata(pdev);

    //gpio_free(dev->gpio_sensor_en);
    //devm_iounmap(&pdev->dev, dev->dma0_base);

    free_irq(dev->irq_a, dev);      
    free_irq(dev->irq_b, dev);      

    //if(dev->setup_done)
        //vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);       
    free_port(dev->port);
    kfree(dev);

    return 0;
}

#if defined (CONFIG_OF)
static const struct of_device_id extractor_of_match[] = {
    {
        .compatible = "titic,extract", .data = (void *) 1,
    },
    {},
};
#else
#define extractor_of_match NULL
#endif

static struct platform_driver extractor_pdrv = {
    .probe      = extractor_probe,
    .remove     = extractor_remove,
    .driver     = {
        .name   = CAPTURE_DRV_NAME,
        .owner  = THIS_MODULE,
        .of_match_table = extractor_of_match,
    },
};

static int extractor_init(void)
{
    return platform_driver_register(&extractor_pdrv);
}

static void extractor_exit(void)
{
    platform_driver_unregister(&extractor_pdrv);
}

module_init(extractor_init);
module_exit(extractor_exit);

MODULE_LICENSE("GPL");
ikwzm commented 5 months ago

Thank you

ikwzm commented 5 months ago

I found that your V4L2 driver uses videobuf2-dma-contig.

static int alloc_port(struct extractor_dev *xdev)
{
    :
    q->mem_ops = &vb2_dma_contig_memops;
    :    
}

With videobuf2-dma-contig, cache synchronization is automatically performed inside the Linux Kernel, so there is no need to explicitly sync u-dma-bufs on the user application side.

vb2_ioctl_qbuf

Let's follow how vb2_ioctl_qbuf handles this.

vb2_ioctl_qbuf()

https://elixir.bootlin.com/linux/v6.1.70/source/drivers/media/common/videobuf2/videobuf2-v4l2.c#L1052

int vb2_ioctl_qbuf(struct file *file, void *priv, struct v4l2_buffer *p)
{
    struct video_device *vdev = video_devdata(file);

    if (vb2_queue_is_busy(vdev->queue, file))
        return -EBUSY;
    return vb2_qbuf(vdev->queue, vdev->v4l2_dev->mdev, p);
}
EXPORT_SYMBOL_GPL(vb2_ioctl_qbuf);

vb2_ioctl_qbuf() calls vb2_qbuf().

vb2_qbuf()

https://elixir.bootlin.com/linux/v6.1.70/source/drivers/media/common/videobuf2/videobuf2-v4l2.c#L802

int vb2_qbuf(struct vb2_queue *q, struct media_device *mdev,
         struct v4l2_buffer *b)
{
    struct media_request *req = NULL;
    int ret;

    if (vb2_fileio_is_active(q)) {
        dprintk(q, 1, "file io in progress\n");
        return -EBUSY;
    }

    ret = vb2_queue_or_prepare_buf(q, mdev, b, false, &req);
    if (ret)
        return ret;
    ret = vb2_core_qbuf(q, b->index, b, req);
    if (req)
        media_request_put(req);
    return ret;
}
EXPORT_SYMBOL_GPL(vb2_qbuf);

vb2_qbuf() calls vb2_core_qbuf()

vb2_core_qbuf()

https://elixir.bootlin.com/linux/v6.1.70/source/drivers/media/common/videobuf2/videobuf2-core.c#L1651

int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb,
          struct media_request *req)
{
    struct vb2_buffer *vb;
    enum vb2_buffer_state orig_state;
    int ret;
    :     
    :     
    :     
    switch (vb->state) {
    case VB2_BUF_STATE_DEQUEUED:
    case VB2_BUF_STATE_IN_REQUEST:
        if (!vb->prepared) {
            ret = __buf_prepare(vb);
            if (ret)
                return ret;
        }
        break;
    case VB2_BUF_STATE_PREPARING:
        dprintk(q, 1, "buffer still being prepared\n");
        return -EINVAL;
    default:
        dprintk(q, 1, "invalid buffer state %s\n",
            vb2_state_name(vb->state));
        return -EINVAL;
    }
    :     
    :     
    :     
    dprintk(q, 2, "qbuf of buffer %d succeeded\n", vb->index);
    return 0;
}
EXPORT_SYMBOL_GPL(vb2_core_qbuf);

vb2_core_qbuf() calls __buf_prepare() to prepare the queue.

__buf_prepare()

https://elixir.bootlin.com/linux/v6.1.70/source/drivers/media/common/videobuf2/videobuf2-core.c#L1406

static int __buf_prepare(struct vb2_buffer *vb)
{
    struct vb2_queue *q = vb->vb2_queue;
    enum vb2_buffer_state orig_state = vb->state;
    int ret;

    if (q->error) {
        dprintk(q, 1, "fatal error occurred on queue\n");
        return -EIO;
    }

    if (vb->prepared)
        return 0;
    WARN_ON(vb->synced);

    if (q->is_output) {
        ret = call_vb_qop(vb, buf_out_validate, vb);
        if (ret) {
            dprintk(q, 1, "buffer validation failed\n");
            return ret;
        }
    }

    vb->state = VB2_BUF_STATE_PREPARING;

    switch (q->memory) {
    case VB2_MEMORY_MMAP:
        ret = __prepare_mmap(vb);
        break;
    case VB2_MEMORY_USERPTR:
        ret = __prepare_userptr(vb);
        break;
    case VB2_MEMORY_DMABUF:
        ret = __prepare_dmabuf(vb);
        break;
    default:
        WARN(1, "Invalid queue type\n");
        ret = -EINVAL;
        break;
    }

    if (ret) {
        dprintk(q, 1, "buffer preparation failed: %d\n", ret);
        vb->state = orig_state;
        return ret;
    }

    __vb2_buf_mem_prepare(vb);
    vb->prepared = 1;
    vb->state = orig_state;

    return 0;
}

__buf_prepare() calls __vb2_buf_mem_prepare() after preprocessing the buffer by memory type.

__vb2_buf_mem_prepare()

https://elixir.bootlin.com/linux/v6.1.70/source/drivers/media/common/videobuf2/videobuf2-core.c#L323

static void __vb2_buf_mem_prepare(struct vb2_buffer *vb)
{
    unsigned int plane;

    if (vb->synced)
        return;

    vb->synced = 1;
    for (plane = 0; plane < vb->num_planes; ++plane)
        call_void_memop(vb, prepare, vb->planes[plane].mem_priv);
}

For videobuf2-dma-contig, call_void_memop() calls vb2_dc_prepare().

vb2_dc_prepare()

https://elixir.bootlin.com/linux/v6.1.70/source/drivers/media/common/videobuf2/videobuf2-dma-contig.c#L123

static void vb2_dc_prepare(void *buf_priv)
{
    struct vb2_dc_buf *buf = buf_priv;
    struct sg_table *sgt = buf->dma_sgt;

    /* This takes care of DMABUF and user-enforced cache sync hint */
    if (buf->vb->skip_cache_sync_on_prepare)
        return;

    if (!buf->non_coherent_mem)
        return;

    /* Non-coherent MMAP only */
    if (buf->vaddr)
        flush_kernel_vmap_range(buf->vaddr, buf->size);

    /* For both USERPTR and non-coherent MMAP */
    dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
}

vb2_dc_prepare() calls dma_sync_sgtable_for_device() if memory type is USERPTR. This function performs cache synchronization.

vb2_buffer_done

Next, let's follow vb2_buffer_done called by the V4L2 driver.

vb2_buffer_done()

https://elixir.bootlin.com/linux/v6.1.70/source/drivers/media/common/videobuf2/videobuf2-core.c#L1058

void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
{
    struct vb2_queue *q = vb->vb2_queue;
    unsigned long flags;

    if (WARN_ON(vb->state != VB2_BUF_STATE_ACTIVE))
        return;

    if (WARN_ON(state != VB2_BUF_STATE_DONE &&
            state != VB2_BUF_STATE_ERROR &&
            state != VB2_BUF_STATE_QUEUED))
        state = VB2_BUF_STATE_ERROR;

#ifdef CONFIG_VIDEO_ADV_DEBUG
    /*
     * Although this is not a callback, it still does have to balance
     * with the buf_queue op. So update this counter manually.
     */
    vb->cnt_buf_done++;
#endif
    dprintk(q, 4, "done processing on buffer %d, state: %s\n",
        vb->index, vb2_state_name(state));

    if (state != VB2_BUF_STATE_QUEUED)
        __vb2_buf_mem_finish(vb);

    spin_lock_irqsave(&q->done_lock, flags);
    if (state == VB2_BUF_STATE_QUEUED) {
        vb->state = VB2_BUF_STATE_QUEUED;
    } else {
        /* Add the buffer to the done buffers list */
        list_add_tail(&vb->done_entry, &q->done_list);
        vb->state = state;
    }
    atomic_dec(&q->owned_by_drv_count);

    if (state != VB2_BUF_STATE_QUEUED && vb->req_obj.req) {
        media_request_object_unbind(&vb->req_obj);
        media_request_object_put(&vb->req_obj);
    }

    spin_unlock_irqrestore(&q->done_lock, flags);

    trace_vb2_buf_done(q, vb);

    switch (state) {
    case VB2_BUF_STATE_QUEUED:
        return;
    default:
        /* Inform any processes that may be waiting for buffers */
        wake_up(&q->done_wq);
        break;
    }
}
EXPORT_SYMBOL_GPL(vb2_buffer_done);

If state is VB2_BUF_STATE_DONE, vb2_buffer_done() calls __vb2_buf_mem_finish().

__vb2_buf_mem_finish()

https://elixir.bootlin.com/linux/v6.1.70/source/drivers/media/common/videobuf2/videobuf2-core.c#L339

static void __vb2_buf_mem_finish(struct vb2_buffer *vb)
{
    unsigned int plane;

    if (!vb->synced)
        return;

    vb->synced = 0;
    for (plane = 0; plane < vb->num_planes; ++plane)
        call_void_memop(vb, finish, vb->planes[plane].mem_priv);
}

For videobuf2-dma-contig, call_void_memop() calls vb2_dc_finish().

vb2_dc_finish()

https://elixir.bootlin.com/linux/v6.1.70/source/drivers/media/common/videobuf2/videobuf2-dma-contig.c#L143

static void vb2_dc_finish(void *buf_priv)
{
    struct vb2_dc_buf *buf = buf_priv;
    struct sg_table *sgt = buf->dma_sgt;

    /* This takes care of DMABUF and user-enforced cache sync hint */
    if (buf->vb->skip_cache_sync_on_finish)
        return;

    if (!buf->non_coherent_mem)
        return;

    /* Non-coherent MMAP only */
    if (buf->vaddr)
        invalidate_kernel_vmap_range(buf->vaddr, buf->size);

    /* For both USERPTR and non-coherent MMAP */
    dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
}

vb2_dc_finish() calls dma_sync_sgtable_for_cpu() if memory type is USERPTR. This function performs cache synchronization.

Conclusion

With videobuf2-dma-contig, cache synchronization is automatically performed inside the Linux Kernel, so there is no need to explicitly sync u-dma-bufs on the user application side.

The cause of the problem is becoming more and more difficult to understand.

zhanghongg commented 5 months ago

Many thanks.

If udmabuf requests an additional bufsize memory size than v4l2 requires, the problem will no longer occur.

Namely, apply for udmabuf with a size of (reqcount+1) * buf'length,

The number of bufs registered to V4L2 is req_count, with a size of buflength.


bool V4l2FdUserPtr::ReqBufs(uint32_t req_count) {
    uint64_t reserved_base_addr = 0;
    reserved_memory_ = std::make_shared<UdmaBuf>(reserved_base_addr, (req_count+1) * buf_length_);

    struct v4l2_requestbuffers reqBufs;
    memset(&reqBufs, 0, sizeof(struct v4l2_requestbuffers));
    reqBufs.count  = req_count;
    reqBufs.type   = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    reqBufs.memory = V4L2_MEMORY_USERPTR;
    if (ioctl(fd_, VIDIOC_REQBUFS, &reqBufs)) {
        log_error("v4l2 VIDIOC_REQBUFS failed: %s", strerror(errno));
        return false;
    }
    video_buf_.resize(req_count);
    dq_video_bufs_.resize(req_count);
    for(int i = 0; i < reqBufs.count; i++){
        struct v4l2_buffer videoBuf;
        memset(&videoBuf, 0, sizeof(struct v4l2_buffer));
        videoBuf.index = i;
        videoBuf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
        videoBuf.memory = V4L2_MEMORY_USERPTR;
        if(ioctl(fd_, VIDIOC_QUERYBUF, &videoBuf)){
            log_error("v4l2 VIDIOC_QUERYBUF failed: %s", strerror(errno));
            return false;
        }
        /// 每个buf的尺寸
        // buf_length_ = videoBuf.length;
        printf("VIDIOC_QUERYBUF videoBufSize: %d, SetFmtBufSize: %d\n", videoBuf.length, buf_length_);
        void* myPtr = reserved_memory_->GetMappAddr();
        video_buf_[i] = reinterpret_cast<unsigned char*>(myPtr) + i*buf_length_;
        videoBuf.m.userptr = reinterpret_cast<unsigned long>(video_buf_[i]);

        if(ioctl(fd_, VIDIOC_QBUF, &videoBuf)){
            log_error("v4l2 VIDIOC_QBUF failed: %s", strerror(errno));
            return false;
        }
    }

    return true;
}
ikwzm commented 5 months ago

What is the next output result?

        printf("VIDIOC_QUERYBUF videoBufSize: %d, SetFmtBufSize: %d\n", videoBuf.length, buf_length_);

What is the value of req_count?

What is the value of stripBufSize?

What is the next process? Do you have the source code?

    video_buf_.resize(req_count);

What is the next process? Do you have the source code?

    dq_video_bufs_.resize(req_count);
zhanghongg commented 5 months ago

What is the next output result?

    printf("VIDIOC_QUERYBUF videoBufSize: %d, SetFmtBufSize: %d\n", videoBuf.length, buf_length_);

Both are 16,777,216.

What is the value of req_count?

During the testing process, there were 4.

What is the value of stripBufSize?

It is also 16,777,216.

What is the next process? Do you have the source code?

video_buf_.resize(req_count);

It is only used in DqBuf.

unsigned char* V4l2FdUserPtr::DqBuf() {
    memset(&active_video_buf_, 0, sizeof(struct v4l2_buffer));
    active_video_buf_.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    active_video_buf_.memory = V4L2_MEMORY_USERPTR;
    if(ioctl(fd_, VIDIOC_DQBUF, &active_video_buf_)){
        log_error("v4l2 VIDIOC_DQBUF failed: %s", strerror(errno));
        return 0;
    }
    // reserved_memory_->Sync(buf_length_, active_video_buf_.index*buf_length_, 1);
    return video_buf_[active_video_buf_.index];
}

What is the next process? Do you have the source code?

dq_video_bufs_.resize(req_count);

Please ignore it, it is currently not being used in any actual interface used.

ikwzm commented 5 months ago

What is the value of req_count?

During the testing process, there were 4.

What is the value of reqBuf.count after the next ioctl(fd_, VIDIOC_REQBUFS,&reqBufs)?

bool V4l2FdUserPtr::ReqBufs(uint32_t req_count) {
    uint64_t reserved_base_addr = 0;
    reserved_memory_ = std::make_shared<UdmaBuf>(reserved_base_addr, (req_count+1) * buf_length_);

    struct v4l2_requestbuffers reqBufs;
    memset(&reqBufs, 0, sizeof(struct v4l2_requestbuffers));
    reqBufs.count  = req_count;
    reqBufs.type   = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    reqBufs.memory = V4L2_MEMORY_USERPTR;
    if (ioctl(fd_, VIDIOC_REQBUFS, &reqBufs)) {
        log_error("v4l2 VIDIOC_REQBUFS failed: %s", strerror(errno));
        return false;
    }
    video_buf_.resize(req_count);
    dq_video_bufs_.resize(req_count);
    for(int i = 0; i < reqBufs.count; i++){

Is it the same value as req_count?

ikwzm commented 5 months ago
    reserved_memory_ = std::make_shared<UdmaBuf>(reserved_base_addr, (req_count+1) * buf_length_);

Where is the source code for the UdmaBuf class?

zhanghongg commented 5 months ago

Is it the same value as req_count?

yes, it is.

Where is the source code for the UdmaBuf class?

UdmaBuf::UdmaBuf(uint64_t base_addr, uint64_t map_size): base_addr_(base_addr), map_size_(map_size) {
    fd_ = open("/dev/udmabuf0", O_RDWR);
    if (fd_ < 0) {   
        log_error("UdmaBuf mmap!");
        assert(false);
    }
    user_ptr_ = mmap64(NULL, map_size_, PROT_READ | PROT_WRITE, MAP_SHARED, fd_, base_addr_);
    if(user_ptr_ != MAP_FAILED) {
        log_info("UdmaBuf mmap succuffully!");
        ClearBuf();
    }
    else {
        log_error("UdmaBuf mmap failed!"); 
    }
}
ikwzm commented 5 months ago

Where is ClearBuf located? Can you show me the source code for "all" of the Udmabuf class, not just the constructor?

zhanghongg commented 5 months ago
#include "UdmaBuf.h"
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <assert.h>
#include "log.h"
#include <string.h>
UdmaBuf::UdmaBuf(uint64_t base_addr, uint64_t map_size): base_addr_(base_addr), map_size_(map_size) {
    fd_ = open("/dev/udmabuf0", O_RDWR);
    if (fd_ < 0) {   
        log_error("UdmaBuf mmap!");
        assert(false);
    }
    user_ptr_ = mmap64(NULL, map_size_, PROT_READ | PROT_WRITE, MAP_SHARED, fd_, base_addr_);
    if(user_ptr_ != MAP_FAILED) {
        log_info("UdmaBuf mmap succuffully!");
        ClearBuf();
    }
    else {
        log_error("UdmaBuf mmap failed!"); 
    }
}

UdmaBuf::~UdmaBuf() {
    close(fd_);
    munmap(user_ptr_, map_size_);
}

void* UdmaBuf::GetMappAddr() {
    return user_ptr_;
}

void UdmaBuf::SetSyncMode(uint8_t mode){

    unsigned char attr[1024];
    int tmpfd_;
    unsigned long  sync_mode = mode;
    if ((tmpfd_  = open("/sys/class/u-dma-buf/udmabuf0/sync_mode", O_WRONLY)) != -1) {
        sprintf((char *)attr, "%d", sync_mode);            
        write(tmpfd_, attr, strlen((const char *)attr));
        close(tmpfd_);
    }
    else {
        log_error("SetSyncMode failed!"); 
    }
}

void UdmaBuf::ClearBuf() {
    /// no memset!!
    unsigned char* tmp = reinterpret_cast<unsigned char*>(user_ptr_);
            memset(tmp,0,map_size_);
    // for(uint64_t i=0; i < map_size_; ++i) {
    //     memset(tmp,0,map_size);
    //     tmp[i] = 0;
    // }            
}

void UdmaBuf::Sync(unsigned long sync_size, unsigned long sync_offset,
                   unsigned int sync_direction) {
    int fd;
    {
        unsigned char  attr[1024];
        if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_offset", O_WRONLY)) != -1) {
            sprintf((char *)attr, "%d", sync_offset); /* or sprintf(attr, "0x%x", sync_offset); */
            write(fd, attr, strlen((const char *)attr));
            close(fd);
        }
    }
    {
        unsigned char  attr[1024];
        if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_size", O_WRONLY)) != -1) {
            sprintf((char *)attr, "%d", sync_size); /* or sprintf(attr, "0x%x", sync_size); */
            write(fd, attr, strlen((const char *)attr));
            close(fd);
        }
    }

    {
        unsigned char  attr[1024];
        if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_direction", O_WRONLY)) != -1) {
            sprintf((char *)attr, "%d", sync_direction);
            write(fd, attr, strlen((const char *)attr));
            close(fd);
        }
    }

    {
        unsigned char  attr[1024];
        unsigned long  sync_for_cpu = sync_direction == 1? 1:0;
        if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_for_cpu", O_WRONLY)) != -1) {
            sprintf((char *)attr, "%d", sync_for_cpu);
            write(fd, attr, strlen((const char *)attr));
            close(fd);
        }
    }

    {
        unsigned char  attr[1024];
        unsigned long  sync_for_device = sync_direction == 1? 0:1;
        if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_for_device", O_WRONLY)) != -1) {
            sprintf((char *)attr, "%d", sync_for_device);
            write(fd, attr, strlen((const char *)attr));
            close(fd);
        }
    }

}
ikwzm commented 5 months ago

umm... Did I make a bad point? Where is UdmaBuf.h?

zhanghongg commented 5 months ago

Sorry.

#pragma once
#include <cstdint>
#include <stdint.h>

class UdmaBuf {
public:
    UdmaBuf(uint64_t base_addr, uint64_t map_size);
    virtual ~UdmaBuf();
    void* GetMappAddr();
    uint64_t GetMapSize() const {
        return map_size_;
    }
    void ClearBuf();

    void Sync(unsigned long sync_size, unsigned long sync_offset, unsigned int sync_direction);
protected:

    // As listed below, sync_mode can be used to configure the cache behavior when the O_SYNC flag is present in open():
    // sync_mode=0: CPU cache is enabled regardless of the O_SYNC flag presence.
    // sync_mode=1: If O_SYNC is specified, CPU cache is disabled. If O_SYNC is not specified, CPU cache is enabled.
    // sync_mode=2: If O_SYNC is specified, CPU cache is disabled but CPU uses write-combine when writing data to DMA buffer improves performance by combining multiple write accesses. If O_SYNC is not specified, CPU cache is enabled.
    // sync_mode=3: If O_SYNC is specified, DMA coherency mode is used. If O_SYNC is not specified, CPU cache is enabled.
    // sync_mode=4: CPU cache is enabled regardless of the O_SYNC flag presence.
    // sync_mode=5: CPU cache is disabled regardless of the O_SYNC flag presence.
    // sync_mode=6: CPU uses write-combine to write data to DMA buffer regardless of O_SYNC presence.
    // sync_mode=7: DMA coherency mode is used regardless of O_SYNC presence.
    void SetSyncMode(uint8_t mode);
private:
    uint64_t base_addr_;
    uint64_t map_size_;
    int fd_;
    void* user_ptr_;
};
ikwzm commented 5 months ago

Thank you!

ikwzm commented 5 months ago

The source code you gave us uses memset to clear u-dma-buf, have you checked the behavior without memset? If not, please check the behavior again with Clear without memset.

zhanghongg commented 5 months ago

Yes, I have checked. Nothing happened, and it takes a long time.

void UdmaBuf::ClearBuf() {
    /// no memset!!
    unsigned char* tmp = reinterpret_cast<unsigned char*>(user_ptr_);
    for(uint64_t i=0; i < map_size_; ++i) {
        tmp[i] = 0;
    }            
}
ikwzm commented 5 months ago

Yes, I have checked. Nothing happened, and it takes a long time.

Nothing happend? Has the buffer been cleared? Did the value written by PL match the value read by the CPU?

zhanghongg commented 5 months ago

The phenomenon is consistent.

Has the buffer been cleared?

yes, it has been cleared.

Did the value written by PL match the value read by the CPU?

mismatching.