Closed wuwbobo2021 closed 1 month ago
The driver of delay_us()
is my bad mistake. I made a correction to the first program and got a better result: it wouldn't end quickly, and sometimes fewer frames were skipped; the raw data speed is about 1MB/s, sometimes equal to that of single buffer transfer.
#include <stm32f10x.h>
#include <system_stm32f10x.h>
#include "usb_lib.h"
#include "usb_pwr.h"
#include "usb_desc.h"
// `sz` must be even number
static inline void usb_send_data(const volatile uint8_t* ptr, uint8_t sz)
{
// defined for the IN (Tx) endpoint
#define DTOG EP_DTOG_TX
#define SW_BUF EP_DTOG_RX
#define ENDP1_DTOG() (_GetENDPOINT(ENDP1) & DTOG)
#define ENDP1_SW_BUF() (_GetENDPOINT(ENDP1) & SW_BUF)
#define ENDP1_SW_BUF_TOGGLE() { _ToggleDTOG_RX(ENDP1); }
if (sz > VIRTUAL_COM_PORT_DATA_SIZE)
sz = VIRTUAL_COM_PORT_DATA_SIZE;
if (ENDP1_DTOG() == ENDP1_SW_BUF()) {
_SetEPTxStatus(ENDP1, EP_TX_DIS);
ENDP1_SW_BUF_TOGGLE();
_SetEPTxStatus(ENDP1, EP_TX_NAK);
}
uint32_t* p_dst;
if (ENDP1_DTOG()) {
_SetEPDblBuf0Count(ENDP1, EP_DBUF_IN, sz); //app should use Buf0 (orginal tx)
p_dst = (uint32_t*)(PMAAddr + 2*ENDP1_TX_BUF0_ADDR);
} else {
_SetEPDblBuf1Count(ENDP1, EP_DBUF_IN, sz); //app should use Buf1 (orginal rx)
p_dst = (uint32_t*)(PMAAddr + 2*ENDP1_TX_BUF1_ADDR);
}
// Note: not for STM32F303xE or STM32F302x8
const volatile uint16_t* p_src = (const volatile uint16_t*) ptr;
for (uint8_t i = (sz + 1) >> 1; i > 0; i--, p_src++, p_dst++)
*(uint16_t*)p_dst = *p_src;
ENDP1_SW_BUF_TOGGLE();
if ((_GetEPTxStatus(ENDP1) & EP_TX_NAK))
_SetEPTxStatus(ENDP1, EP_TX_VALID);
while (ENDP1_DTOG() == ENDP1_SW_BUF()) {
if (bDeviceState != CONFIGURED) return;
}
}
static volatile uint8_t Seq_Loop[64] = {
0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27,
0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37,
0x40, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47,
0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57,
0x60, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67,
0x70, 0x71, 0x72, 0x73, 0x74, 0x75, 0x76, 0x77
};
int main()
{
USB_Config();
for (uint32_t i = 0; i < 7200000; i++) //delay after USB_Config()
_SetEPTxStatus(ENDP1, EP_TX_NAK);
while (1) {
if (bDeviceState != CONFIGURED) continue;
usb_send_data(Seq_Loop, sizeof(Seq_Loop));
Seq_Loop[0] += 1;
}
}
Waiting for EP_CTR_TX after DTOG is toggled by hardware (except for the first transfer) merely makes it much slower.
According to the RM: "At the end of each transaction the CTR_RX or CTR_TX bit of the addressed endpoint USB_EPnR register is set, depending on the enabled direction. At the same time, the affected DTOG bit in the USB_EPnR register is hardware toggled making the USB peripheral buffer swapping completely software independent.". Is this really true? Actually I have problem debugging STM32 USB applications, either by Keil or by OpenOCD.
Thanks for sharing the details of your attempts to get USB double-buffering working on the STMF103xx series MCUs.
As you can see from the commit dates, I haven't worked on papoon_usb for a very long time (5 years!!) and am unlikely to do so at any time in the near future. If anything, I've always wanted write something similarly clean/fast/efficient/small for the USB subsystem in the later STM chips such as the STM32F7xx series, but I was once warned by someone knowledgeable on the ST Community forums the USB implementation there was even more complex and incompletely documented than the F10xx one. Given the huge amount of time it took to get papoon_usb working, if and when I need to use the F7 series I'll probably just use whatever USB library from ST or others I can find, regardless how convoluted the API and slow its execution. (ST doesn't care, it just means customers have to spec a larger/faster/costlier chip from their lineup.)
Your experiences finding that double-buffering might not be any faster than single match my recollections of my thoughts when I was writing papoon-usb: If the embedded application can't keep up with the data rate (either via polling or interrupt; both are supported here) double-buffering just allows one additional "buffer time" of leeway before the chip NAKs the host and stalls the transfer. The hardware already implements full muti-core processing (the USB subsystem runs asynchronously from the main CPU), so again either the application can process/store the data at the host's rate (modulo internal memory bandwidth contention) or it can't.
Again, thanks. If you do get some conclusive (and reliable!) improvement with double-buffering and can find a way to cleanly add it to papoon_usb, please feel free to do so and submit a pull request here from your cloned repo, of course staying within the terms of the GPL license. If I can find the time I'll integrate the changes, again of course crediting you for your contributions.
Thanks for the reply, but currently I have achieved the goal by another mean. The requirement is to transfer large enough bulks of continuous data from a peripheral’s dedicated large SRAM area (for DMA) to the USB host. At first I did it by driving the usb_send_data function on DMA half transfer/transfer complete events, which is a loop of sending data from half of that SRAM area by multiple packets of 64B. It reached a rate high enough on Linux with USB 2.0 host controller, but failed with either Windows or USB 3.0 host controller.
Double buffering, if achieved, can make copying from SRAM to PMA and the IN transfer from PMA to the host work concurrently, even if sending data via USB is the MCU’s only thing to do, I thought.
I didn’t make it work, though. instead, I changed the DMA transfer unit size to match the endpoint buffer size 64B, and created a FIFO of 64B units. An end sequence is sent at the end of every bulk, the overflow sequence is sent when the FIFO becomes full (and then cleared). Currently it works well enough.
Here's a failed attempt for USB bulk double-buffered Tx IN transfer (from device to host) on STM32F103, based on
VirtualComport_Loopback
example inSTM32_USB-FS-Device_Lib_V4.1.0
.Unfortunately there's no sample application on the web doing such a thing, but merely a few clues:
Audio_Speaker
inSTM32_USB-FS-Device_Lib_V4.1.0
(isochronous Rx OUT transfer), https://github.com/catphish/stm32-doublebuffer (a bit strange), https://community.st.com/t5/mems-sensors/inprove-the-speed-of-the-mass-storage-as-u-disk/td-p/523996. USB PMA double buffering is unimplemented in thispapoon_usb
according to the TODO list, and it seems unimplemented in these two libaries: https://github.com/stm32-rs/stm32-usbd, https://github.com/embassy-rs/embassy. But it's probably managed in the ST's HAL library.While worrying about the intelectual ability of myself, I begin to suspect that the STM32 USB's double buffering mechanism itself is probably unreliable for correct bulk transfers (?). But I'll probably rewrite the test program based on
stm32f1xx_ll_usb.c
orstm32f1xx_hal_pcd.c
, which will hopefully do some double buffer bulk management.I don't have a good understanding of how double buffering should be handled in the USB registers of STM32, even after reading those related sections in the reference manual. The small test below doesn't work as expected. Changing the transfer unit from 64B to 60B makes no difference. Expected behavior: frame2_seq_num - frame1_seq_num = 1 for every two contiguous frames of size 64B except the situation of frame2_seq_num = 0 and frame1_seq_num = 0xff, this can be ensured without double-buffering.
The key of the failure: the program failed to determine whether (and when) the half buffer storing the previous packet is transferred successfully, by EP_CTR_TX or by comparing DTOG with SW_BUF, before overwriting or retrying.
in
usb_conf.h
:in
usb_prop.c
:main.c
:data received by the host (ended soon):
usb_send_data()
can be modified (badly?) to make it send data continuously, the raw data rate is 1088KB/s on Linux (USB 2.0 host controller, USB CDC tty, but it should be slower with libusb/Windows. it would be 950KB/s with single endpoint PMA buffer, slower with libusb/Windows.), but it consists of lots of repeated frames. Uncommenting the two_SetEPTxStatus(ENDP1, EP_TX_NAK)
will change the rate to about 10KB/s, but it's still repeated frames (with the same frame number).host speed test program: