A USB DFU bootloader for SAMD21 (Cortex M0) uses under 4KB of flash and 1KB of RAM.
Compared to vendor USB stacks (Atmel ASF, Keil), it is much lighter weight, provides you the tools to build a fully-custom USB device, perhaps with multiple interfaces and endpoints, rather than implementing a fixed class. It uses structures instead of byte arrays to make descriptors more readable, and interrupt-context callbacks to integrate with your bare-metal code or RTOS scheduler.
Compared to LUFA, it has better support for Cortex-M devices, is better suited for modern DMA-integrated USB controllers, and more interrupt-driven, but does not provide as many class drivers.