sipeed / LicheeRV-Nano-Build

LicheeRV-Nano-Build
108 stars 41 forks source link

Very slow image acquisition #33

Closed krasin closed 4 months ago

krasin commented 4 months ago

Hi there,

thank you for working on this incredibly cool project.

I am trying to take photos using GC4653 4MP camera sold together with LicheeRV Nano. The acquisition time using sensor_test is incredibly slow (around 1 second):

$ sensor_test
[sys_vi_init]-41: MMF Version:2459349f4-musl_riscv64
[SAMPLE_COMM_SNS_ParseIni]-2171: Parse /mnt/data/sensor_cfg.ini
[parse_source_devnum]-1764: devNum =  1
[parse_sensor_name]-1845: sensor =  GCORE_GC4653_MIPI_4M_30FPS_10BIT
[parse_sensor_busid]-1874: bus_id =  4
[parse_sensor_i2caddr]-1885: sns_i2c_addr =  29
[parse_sensor_mipidev]-1896: mipi_dev =  0
[parse_sensor_laneid]-1907: Lane_id =  4, 3, 2, -1, -1
[parse_sensor_pnswap]-1918: pn_swap =  0, 0, 0, 0, 0
[parse_sensor_mclken]-1940: mclk_en =  1
[parse_sensor_mclk]-1951: mclk =  1
[SAMPLE_PLAT_SYS_INIT]-72: common pool[0] BlkSize 5652480
ISP Vipipe(0) Allocate pa(0x8de20000) va(0x0x3fed152000) size(291424)
stSnsrMode.u16Width 2560 stSnsrMode.u16Height 1440 25.000000 wdrMode 0 pstSnsObj 0xebad0
[SAMPLE_COMM_VI_StartMIPI]-494: sensor 0 stDevAttr.devno 0
awbInit ver 6.9@2021500
0 R:1400 B:3100 CT:2850
1 R:1500 B:2500 CT:3900
2 R:2300 B:1600 CT:6500
Golden 1024 1024 1024
WB Quadratic:0
isWdr:0
ViPipe:0,===GC4653 1440P 30fps 10bit LINEAR Init OK!===
********************************************************************************
cvi_bin_isp message
gerritId:      97347          commitId:      fc6ce647       
md5:           d6db2297ddfd44e8252c1f3f888f47b2
sensorNum      1              
sensorName0    4653           

PQBIN message
gerritId:      
              commitId:      fc6ce647       
md5:           d6db2297ddfd44e8252c1f3f888f47b2
sensorNum      1              
sensorName0    4653           

author:        lxowalle       desc:          gc4653 30fps   
createTime:    2024-03-26 13:41:34version:                      
tool Version:       v3.0.5.48           mode:      
********************************************************************************
[SAMPLE_COMM_ISP_Thread]-95: ISP Dev 0 running!
0 R:1298 B:3773 CT:2688
1 R:1629 B:2599 CT:3890
2 R:2430 B:1675 CT:7300
Golden 1629 1024 2599
wdrLEOnly:1
[main]-543: ---Basic------------------------------------------------
[main]-544: 1: dump vi raw frame
[main]-545: 2: dump vi yuv frame
[main]-546: 3: set chn flip/mirror
[main]-547: 4: linear wdr switch
[main]-548: 5: AE debug
[main]-549: 6: sensor dump
[main]-550: 7: sensor proc
[main]-551: 255: exit
2
Get frm from which chn(0~1):
0
how many loops to do(11111 is infinite:
100
ms consumed: 225.063004
ms consumed: 1690.901978
ms consumed: 1703.642944
ms consumed: 1413.646973
ms consumed: 575.383972
ms consumed: 2891.160889
ms consumed: 1506.464966
ms consumed: 1678.133057
ms consumed: 1458.702026
ms consumed: 1476.683960
ms consumed: 1264.701050
ms consumed: 3398.766113
ms consumed: 1489.883057
ms consumed: 1450.162964
ms consumed: 966.010010
ms consumed: 1473.827026
ms consumed: 1479.792969
ms consumed: 3142.443115
ms consumed: 822.254028
ms consumed: 5736.145020
ms consumed: 1718.666016
ms consumed: 1373.817017
ms consumed: 2539.612061
ms consumed: 1464.227051
ms consumed: 1415.610962
ms consumed: 1083.852051

Is there a particular reason for this to be so slow? Given that the board has 4 MIPI CSI-2 lanes, I would have expected a delay around 30-40 ms, but not 1000-1500 ms.

Is there a better way to do this?

wilesun commented 4 months ago

Is it so slow? I also want to use its video encoding and decoding capabilities, but it seems that it is not feasible.

krasin commented 4 months ago

Video works fine, actually: https://youtu.be/Zg73V6hciUM

It's the case of acquiring a single image that is very slow. I suspect something keeps being initialized on every frame or some other bug in that particular example (sensor_test)

krasin commented 4 months ago

The biggest contribution to latency is writing the dump to a slow SD card. When I started to write to tmpfs, the time to grab the frame is consistently ~200ms, except for the very first frame:

width: 2560, height: 1440, total_buf_length: 5529600
ms consumed: 476.569000
width: 2560, height: 1440, total_buf_length: 5529600
ms consumed: 199.576004
width: 2560, height: 1440, total_buf_length: 5529600
ms consumed: 198.602997
width: 2560, height: 1440, total_buf_length: 5529600
ms consumed: 198.850998
width: 2560, height: 1440, total_buf_length: 5529600
ms consumed: 198.341995
width: 2560, height: 1440, total_buf_length: 5529600
ms consumed: 198.636002
width: 2560, height: 1440, total_buf_length: 5529600
ms consumed: 198.854004
width: 2560, height: 1440, total_buf_length: 5529600
ms consumed: 198.496994

This is still very slow, but much better than before. I will continue diving into it, as I need it to work fast.

krasin commented 4 months ago

If I copy the pixels to a memory buffer instead of writing into a file on tmpfs, the fimes are somewhat better: 166ms instead of ~200ms.

And if I take the photos, but don't copy the pixels, then the timings are good:

ViPipe:0,===GC4653 1440P 30fps 10bit LINEAR Init OK!===
...
width: 2560, height: 1440, total_buf_length: 5529600
ms consumed: 427.342010
width: 2560, height: 1440, total_buf_length: 5529600
ms consumed: 33.511002
width: 2560, height: 1440, total_buf_length: 5529600
ms consumed: 34.113998
width: 2560, height: 1440, total_buf_length: 5529600
ms consumed: 31.136999
width: 2560, height: 1440, total_buf_length: 5529600
ms consumed: 32.150002
width: 2560, height: 1440, total_buf_length: 5529600
ms consumed: 33.304001
width: 2560, height: 1440, total_buf_length: 5529600
ms consumed: 31.170000
width: 2560, height: 1440, total_buf_length: 5529600
ms consumed: 32.153999
width: 2560, height: 1440, total_buf_length: 5529600
ms consumed: 32.133999
width: 2560, height: 1440, total_buf_length: 5529600
ms consumed: 32.064999

So, taking photos is 33ms (which is the expected time for 30 FPS), but copying the data (~5MB) out of the buffer takes 130ms. That corresponds to 38 MB/s memory bandwidth which is much less than the expected value for DDR3 (~6 GB/s).

krasin commented 4 months ago

I made sure that there are no extra memory allocation between the frames, and the times stay the same: 166ms to take a frame and copy it out of the system buffer to my own buffer using memcpy.

This is strange. It copies the data about 200 times slow than it should.

krasin commented 4 months ago

I confirmed that copying buffers in regular memory is much faster. It's copying from stVideoFrame.stVFrame.pu8VirAddr[i] that is slow...

krasin commented 4 months ago

I am closing this issue. Not because it's resolved (it's not; at least, not fully), but because the original description is too confusing. I will consider refiling this issue with a better description.