intel / xess

555 stars 41 forks source link

libxess.dll spends more than 24MB of its binary initializing stack objects #23

Open Sonicadvance1 opened 4 months ago

Sonicadvance1 commented 4 months ago

There are three functions inside of libxess.dll which are more than 8MB apiece initializing stack objects using single byte move instructions.

This consumes more than 24MB in the library, which with v1.3.0's binary being 71MB, that's 33.8% of the binary size. As far as I can tell, these also cause initialization time of the library to take multiple seconds once these get executed as well.

From libxess.dll v1.3.0 these functions live at offsets 0x2b3c0, 0x29c550, and 0x50dfc0. Since we don't have function symbol names for these, it's unknown what these functions are called. Looks like they are generating some dll file on stacks or something though.

Ideally these binaries could just live in an array structure in the dll and they get memcpy'd in to the stack (or whatever it is doing). This has the potential to shave off the majority of this 24MB of code and improving initialization time.

Video attached scrolling through the binary a small portion. The code layout on the right having three large contiguous blocks of blue, showcasing each individual block of ~8MB of code.

https://github.com/intel/xess/assets/1018829/3463860e-8f5d-497c-9efc-345b9c329b0c

xess-intel commented 2 months ago

@Sonicadvance1 thank you for proving the feedback! We will address it in future updates of the SDK.

maxchisto commented 1 month ago

@Sonicadvance1 how far did you get understanding what libxess does internally? I'm wondering if APIs that it needs to run Intel-optimized version of the model on XMX engines on Linux can be implemented using SYCL Joint Matrix Extension

I imagine neural-network-based upscaling is implemented very roughly like this: take a stack of past N frames and feed it to a convolutional net. that's just a bunch of matrix multiplications