ST-Link and J-link seem to always use 2 idle cycles at the end of transactions. The STM32F0 reference manual in s. 26.5.4 says this about idle cycles:
"Because of the asynchronous clock domains SWCLK and HCLK, two extra SWCLK cycles are needed after a write transaction (after the parity bit) to make the write effective internally."
I think 2 would be a better default than 0 idle cycles: https://github.com/pyocd/pyOCD/blob/master/pyocd/probe/pydapaccess/cmsis_dap_core.py#L243
ST-Link and J-link seem to always use 2 idle cycles at the end of transactions. The STM32F0 reference manual in s. 26.5.4 says this about idle cycles: "Because of the asynchronous clock domains SWCLK and HCLK, two extra SWCLK cycles are needed after a write transaction (after the parity bit) to make the write effective internally."