nxp-mcuxpresso / rpmsg-lite

RPMsg implementation for small MCUs
BSD 3-Clause "New" or "Revised" License
228 stars 73 forks source link

Required shared memory size? #23

Closed infn-ke closed 1 year ago

infn-ke commented 2 years ago

What is the required shared memory size for rpmsg? Would the following formula correctly express the requirements?

2 * (RL_BUFFER_COUNT * (RL_BUFFER_PAYLOAD_SIZE + 16) + vring_size(RL_BUFFER_COUNT, VRING_ALIGN))
MichalPrincNXP commented 2 years ago

Hello @infn-ke , yes, your calculation is correct.

infn-ke commented 2 years ago

Thanks for confirming the shared memory size requirements!

The parameters RL_BUFFER_COUNT and RL_BUFFER_PAYLOAD_SIZE seems to have the requirement of the being a power of 2, like this;

#define RL_BUFFER_COUNT (1<<m)
#define RL_BUFFER_PAYLOAD_SIZE ((1<<n) - 16)

But what are the requirements on VRING_ALIGN? There is a comment in the code like this "Linux requires the ALIGN to 0x1000(4KB) instead of 0x80". Why is that requirement? Is it for interop with Linux kernel rpmsg, but not for book ended rpmsg-lite communication?

I'm having difficulties setting all these parameters correct to fully utilize my shared memory.

Hadatko commented 2 years ago

It is pitty that this function has ability to be macro (and it is not) so we wouldn't need define hardcoded VRING_SIZE macro. Anyway instead of hardcoded VRING_SIZE value we can create macro with same output as vring_size function. I think that is better than guess VRING_SIZE.

Hadatko commented 2 years ago

@MichalPrincNXP Do you think we can replace VRING_SIZE definition?

#define VRING_SIZE1 (RL_BUFFER_COUNT * sizeof(struct vring_desc))
#define VRING_SIZE2 (VRING_SIZE1 + sizeof(struct vring_avail) + (RL_BUFFER_COUNT * sizeof(uint16_t)) + sizeof(uint16_t))
#define VRING_SIZE3 ((VRING_SIZE2 + VRING_ALIGN - 1UL) & ~(VRING_ALIGN - 1UL))
#define VRING_SIZE4 (VRING_SIZE3 + sizeof(struct vring_used) + (RL_BUFFER_COUNT * sizeof(struct vring_used_elem)) + sizeof(uint16_t))
#define VRING_SIZE (((int32_t)VRING_SIZE4))
infn-ke commented 2 years ago

My problem is that it seems to me that I'm only able to use about half of the shared memory given the constraints on these parameters.

Assume I have 4 kB memory and apply the following settings;

#define RL_BUFFER_COUNT 4
#define RL_BUFFER_PAYLOAD_SIZE 496

Now I need (2 * 4 * (496+16)) + 2 * vring_size(), this exceeds 4 kB. I can then lower RL_BUFFER_COUNT from 4 to 2 or reduce RL_BUFFER_PAYLOAD_SIZE from 496 down to 240. But both options will end up in using just a little bit more than 2 kB out of 4 kB memory.

MichalPrincNXP commented 2 years ago

Thanks for confirming the shared memory size requirements!

The parameters RL_BUFFER_COUNT and RL_BUFFER_PAYLOAD_SIZE seems to have the requirement of the being a power of 2, like this;

#define RL_BUFFER_COUNT (1<<m)
#define RL_BUFFER_PAYLOAD_SIZE ((1<<n) - 16)

But what are the requirements on VRING_ALIGN? There is a comment in the code like this "Linux requires the ALIGN to 0x1000(4KB) instead of 0x80". Why is that requirement? Is it for interop with Linux kernel rpmsg, but not for book ended rpmsg-lite communication?

I'm having difficulties setting all these parameters correct to fully utilize my shared memory.

RPMsg_Lite is binary compatible with the Linux rpmsg driver, because the Linux implementation requires the vring alignment to 4KB (0x1000) the same alignment needs to be set on the opposite site running the rpmsg_lite implementation. In cases when there is rpmsg_lite master versus rpmsg_lite remote running, the vring alignment can be reduced to 0x10 (16B) - this is the typical value we used for CM0+/CM4/CM33 platforms.

MichalPrincNXP commented 2 years ago

@MichalPrincNXP Do you think we can replace VRING_SIZE definition?

#define VRING_SIZE1 (RL_BUFFER_COUNT * sizeof(struct vring_desc))
#define VRING_SIZE2 (VRING_SIZE1 + sizeof(struct vring_avail) + (RL_BUFFER_COUNT * sizeof(uint16_t)) + sizeof(uint16_t))
#define VRING_SIZE3 ((VRING_SIZE2 + VRING_ALIGN - 1UL) & ~(VRING_ALIGN - 1UL))
#define VRING_SIZE4 (VRING_SIZE3 + sizeof(struct vring_used) + (RL_BUFFER_COUNT * sizeof(struct vring_used_elem)) + sizeof(uint16_t))
#define VRING_SIZE (((int32_t)VRING_SIZE4))

Agree it would be better to made it more clear and define the VRING_SIZE based on RL_BUFFER_COUNT and VRING_ALIGN instead of hardcoded constat value, let me evaluate that. Thank you.

MichalPrincNXP commented 2 years ago

My problem is that it seems to me that I'm only able to use about half of the shared memory given the constraints on these parameters.

Assume I have 4 kB memory and apply the following settings;

#define RL_BUFFER_COUNT 4
#define RL_BUFFER_PAYLOAD_SIZE 496

Now I need (2 * 4 * (496+16)) + 2 * vring_size(), this exceeds 4 kB. I can then lower RL_BUFFER_COUNT from 4 to 2 or reduce RL_BUFFER_PAYLOAD_SIZE from 496 down to 240. But both options will end up in using just a little bit more than 2 kB out of 4 kB memory.

I am afraid we can't do anything with that, that's the rpmsg design. Sometimes it is not easy to find the optimal way of shared memory use, would it be possible in your case to allocate 6K instead of 4K?

infn-ke commented 2 years ago

I am afraid we can't do anything with that, that's the rpmsg design. Sometimes it is not easy to find the optimal way of shared memory use, would it be possible in your case to allocate 6K instead of 4K?

How did you achieve 6 kB? RL_BUFFER_COUNT and RL_BUFFER_PAYLOAD_SIZE increments by power of 2, so next step would give 8+ kB. Increasing VRING_SIZE uses up more shared mem but does not increase buffer count or payload size.

infn-ke commented 2 years ago

RPMsg_Lite is binary compatible with the Linux rpmsg driver, because the Linux implementation requires the vring alignment to 4KB (0x1000) the same alignment needs to be set on the opposite site running the rpmsg_lite implementation. In cases when there is rpmsg_lite master versus rpmsg_lite remote running, the vring alignment can be reduced to 0x10 (16B) - this is the typical value we used for CM0+/CM4/CM33 platforms.

Can you elaborate a little bit on the requirements for VRING_ALIGN? You say you have been running with 0x10, but could you go lower, e.g. 0x8 bytes? Sizing of this parameter is a bit of black magic to me.

MichalPrincNXP commented 2 years ago

Hello @infn-ke , I meant if you would be able to allocate 6kB, you would fit into this shared memory size with 4 buffers (RL_BUFFER_COUNT 2) of 496 bytes size, if this is your application requirement. Yes, still there is the vring-related overhead, that's the fact. As for VRING_ALIGN, this is the parameter of the virtqueue/vring generic sw component. This has been taken/reused as is. In Linux world 4K alignment is used (I do now know the background for this value), outside the Linux it is up to the user. I think this depends on platform/core type and the optimal way of memory access, 0x10 has been chosen in the past, I guess it is optimal for cortex M memory access.

infn-ke commented 2 years ago

I meant if you would be able to allocate 6kB, you would fit into this shared memory size with 4 buffers (RL_BUFFER_COUNT 2) of 496 bytes size, if this is your application requirement. Yes, still there is the vring-related overhead, that's the fact.

But the requirement for RL_BUFFER_COUNT and RL_BUFFER_PAYLOAD_SIZE to be a power of 2, is that an rpmsg protocol requirement or and implementation side effect for rpmsg-lite? This leads to wasting half of resource constrained shared memory. Physical memories come in power of 2, so if my physical memory is 64 kB, I will not be able to use more than 32 kB of it for rpmsg messages.

This has been taken/reused as is. In Linux world 4K alignment is used (I do now know the background for this value), outside the Linux it is up to the user. I think this depends on platform/core type and the optimal way of memory access, 0x10 has been chosen in the past, I guess it is optimal for cortex M memory access.

I would guess that the number of 0x1000 on Linux comes from page alignment that is typically 4 kB on x86 (check with sysconf(_SC_PAGESIZE)). We have been putting this number to 0x8, as we cannot waste 4 kB in our resource constrained environment.

MichalPrincNXP commented 2 years ago

Hi @infn-ke

But the requirement for RL_BUFFER_COUNT and RL_BUFFER_PAYLOAD_SIZE to be a power of 2, is that an rpmsg protocol requirement or and implementation side effect for rpmsg-lite? This leads to wasting half of resource constrained shared memory. Physical memories come in power of 2, so if my physical memory is 64 kB, I will not be able to use more than 32 kB of it for rpmsg messages.

as you can see from virtio_ring.h / line 99, number of buffer descriptors is expected to be power of 2.

infn-ke commented 2 years ago

What are the requirements on VRING_SIZE, does that have to be a power of 2? Or we can take the exact number from the vring_size?

E.g. if I do vring_size with RL_BUFFER_COUNT=8 and VRING_ALIGN=8 I get 230 bytes. Is that a valid setting?

MichalPrincNXP commented 2 years ago

Hi @infn-ke , it seems the VRING_SIZE does not need to be a power of 2, you could take 230 as calculated by vring_size() function.

infn-ke commented 2 years ago

Hi @infn-ke , it seems the VRING_SIZE does not need to be a power of 2, you could take 230 as calculated by vring_size() function.

Thanks for confirming! Then it might a good idea to re-define VRING_SIZE macro as suggested by @Hadatko, see https://github.com/NXPmicro/rpmsg-lite/issues/23#issuecomment-1015197537. There is no need to have user define VRING_SIZE as it can be derived from RL_BUFFER_COUNT and VRING_ALIGN.

MichalPrincNXP commented 2 years ago

yes, agree and will do that

Hadatko commented 2 years ago

It looks like this is related for master side not remote correct? So no change for me :D But i am happy to help others to save memory / remove hardcoded values.

MichalPrincNXP commented 1 year ago

Resolved in https://github.com/nxp-mcuxpresso/rpmsg-lite/releases/tag/v5.0.0