reticulatedpines / magiclantern_simplified

A Git based version of Magic Lantern, for those unwilling or unable to work using Mercurial. The vast majority of branches have been removed, with those thought to be important brought in individually and merged.
GNU General Public License v2.0
142 stars 47 forks source link

Possible to parallelise EDMAC transfers? #120

Closed reticulatedpines closed 6 months ago

reticulatedpines commented 7 months ago

D45 code uses EDMAC in a slightly weird way, due to limited understanding of the structs used for config when the code was written. It looks to me like this will prevent parallel operation of DMA transfers, which the patent mentions as a possible mode of operation.

Forum discussion of structs: https://www.magiclantern.fm/forum/index.php?topic=18315.0 Patent: https://patentimages.storage.googleapis.com/35/95/b9/9f99dd75a26a90/US7817297.pdf

edmac_copy_rectangle_cbr_start() takes a definition of dst and src rectangular regions. It creates an edmac_info struct, which notably only initialises xb, yb and associated offsets. This means we create a single "tile" covering the entire region. The Canon APIs are designed to copy the region using a larger number of smaller tiles. This allows for parallel operation.

If we split the region into more tiles, by using xa, ya and xn, transfers might be significantly increased in speed.

This will require testing, since a) I'm not entirely confident in my understanding of the patent, b) I don't know if DMA internals implement the parallel optimisation, and c) I don't know if it's automatic (maybe you have to assign multiple channels, or similar, which can be processed in parallel?). The basic test is easy though, so it's worth trying.

reticulatedpines commented 7 months ago

Dumping some usage of these structs from 200d, vfx_mem_to_mem() call chain, Canon only uses xb and yb. So it is at least normal.

   293: 16809.093  ==== in m2m_setup_copy
   294: 16809.098  off1a: 0x0
   295: 16809.104  off1b: 0x0
   296: 16809.109  off2a: 0x0
   297: 16809.114  off2b: 0x0
   298: 16809.118   off3: 0x0
   299: 16809.123     xa: 0x0
   300: 16809.128     xb: 0x2ee0
   301: 16809.133     ya: 0x0
   302: 16809.138     yb: 0xf9f
   303: 16809.143     xn: 0x0
   304: 16809.147     yn: 0x0

Doesn't mean they always do things this way, does weakly imply there's no automatic optimisation.

reticulatedpines commented 6 months ago

At least on 200D, using multiple tiles doesn't change the transfer speed. Either we need more magic, or, I suspect more likely, they parallelise in a row based fashion regardless of number of tiles.