nasa / osal

The Core Flight System (cFS) Operating System Abstraction Layer (OSAL)
Apache License 2.0
545 stars 213 forks source link

Add user-space message queue library to the OSAL (GSFC DCR 22160) #73

Open skliper opened 4 years ago

skliper commented 4 years ago

The GSFC ATLAS project developed an alternate queue library to use with POSIX to overcome a performance limitation with the Linux Posix message queues.

Incorporate this enhancement (or similar enhancement) into the OSAL for POSIX, RTEMS, and VxWorks.

skliper commented 4 years ago

Imported from trac issue 50. Created by sstrege on 2015-05-14T18:58:30, last modified: 2019-08-14T14:11:46

skliper commented 4 years ago

Trac comment by sstrege on 2015-05-14 19:08:17:

Solution is dependent on Trac Ticket #28. In the current OSAL model one would have to write a separate implementation for each of VxWorks, RTEMS, and POSIX, yet this feature is fully self-contained and not OS-dependent so all 3 would be identical. Once #28 is merged in this can be done once in a shared area - much cleaner.

CDKnightNASA commented 4 years ago

Looks like mqueue does have "priority" but we aren't really using it...and we have no control over its logic. Writing our own queues would allow us full control.

jphickey commented 4 years ago

Note that in addition to WSL this also helps with BSD variants and derivatives (Mac OS, FreeBSD, etc) which don't seem to offer POSIX mqueues.

This change would have a pretty high value in improving the cross platform applicability of OSAL by removing the dependency on posix queues.

CDKnightNASA commented 4 years ago

Note that FPrime has a userspace queue for OSX support; it's written in C++ (of course) but I'm wondering if we could unify our codebase for queues.

https://github.com/nasa/fprime/blob/master/Os/MacOs/IPCQueueStub.cpp

ivanperez-keera commented 1 year ago

If I understand this correctly, a reimplementation of message queues as proposed would also facilitate using cFS in docker containers without root on the host (which is a common restriction on NASA machines).

Right now, I can't run cFS inside docker because /proc/sys/fs/mqueue/msg_max is 10 in the container and I cannot increase it.

skliper commented 1 year ago

@ivanperez-keera - if you can set msg_max higher than your maximum requested queue depth on the host then you'll avoid the issue, or if you can live with the limit of 10 there's the OSAL_CONFIG_DEBUG_PERMISSIVE_MODE:

https://github.com/nasa/cFE/blob/6d96c6e856a654f7c96e66a87b003aa01ff96874/cmake/sample_defs/native_osconfig.cmake#L39

For development in dockers from a desktop/laptop host I typically just use permissive mode. For performance testing or similar where it really matters I either run it on a more representative system (or emulator of) or get an admin to increase the msg_max setting such that I can use deeper queues. Either way avoids the need for root on host.

User space queues would avoid the issue though, which would be nice.

ivanperez-keera commented 9 months ago

get an admin to increase the msg_max

On the host, msg_max is 4096. Inside docker, it's reduced to 10. I have yet to figure out why. I also opened a question on stackoverflow months ago but received no replies: https://stackoverflow.com/questions/75329421/docker-fs-mqueue-msg-max-set-to-10-in-spite-of-hosts-being-4096.

ivanperez-keera commented 9 months ago

@skliper That seems to be set to true by default when using native, which I am using. I'm very confused about why I'm still getting an error. Is there something else I need to do to set permissive mode during compilation?

When I grep for PERMISSIVE in my tree after building, my CMakeCache.txt files all indicate that OSAL_CONFIG_DEBUG_PERMISSIVE_MODE is false.

Here's my dockerfile: https://github.com/nasa/cFS/discussions/718#discussion-5898275

skliper commented 9 months ago

get an admin to increase the msg_max

On the host, msg_max is 4096. Inside docker, it's reduced to 10. I have yet to figure out why. I also opened a question on stackoverflow months ago but received no replies: https://stackoverflow.com/questions/75329421/docker-fs-mqueue-msg-max-set-to-10-in-spite-of-hosts-being-4096.

@ivanperez-keera - I increase msg_max on my docker w/ a parameter to docker run --sysctl fs.mqueue.msg_max=10000. Try that and if it doesn't work could you post the error message?

ivanperez-keera commented 9 months ago

Try that and if it doesn't work could you post the error message?

In some environments, I can't sudo. The message I get there is:

docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc
create failed: unable to start container process: error during container init: open
/proc/sys/fs/mqueue/msg_max: permission denied: unknown.
ivanperez-keera commented 9 months ago

@skliper Nevertheless, you said " if you can live with the limit of 10 there's the OSAL_CONFIG_DEBUG_PERMISSIVE_MODE:".

However, when I tried that, I couldn't make it work. See: https://github.com/nasa/osal/issues/73#issuecomment-1836291852

How do I set PERMISSIVE? It the dockerfile linked below, and without adjusting fs.mqueue.msg_max, what would I have to change to run the app without root privileges on the host, and staying under the limit of 10?

https://github.com/nasa/cFS/discussions/718#discussion-5898275

skliper commented 9 months ago

Permissive is set here in the example setup when native: https://github.com/nasa/cFE/blob/b72dd4e1f9f44c7dbb7a12895b5ac1635eb239b2/cmake/sample_defs/native_osconfig.cmake#L39 There's some file matching magic, but you could set it in the default config if you want it to apply to more than native.

If you can't set permissive mode, in theory you could override software bus queue depths to all be <= 10. I'm not sure they are configurable in every app though.

As a quick fix/test (if PERMISSIVE can't be fixed quickly) could just force the limit from the OSAL API implementation for queues. I haven't done the analysis to figure out what might require more than a depth of 10 for nominal operations, maybe sbr and/or the various forms of to (to_lab, etc) depending on how they are designed. See: https://github.com/nasa/osal/blob/02916222e13c373f7d78c87ae756120d89bb4c87/src/bsp/generic-linux/src/bsp_start.c#L66-L78 https://github.com/nasa/osal/blob/02916222e13c373f7d78c87ae756120d89bb4c87/src/os/posix/src/os-impl-queues.c#L59-L69

skliper commented 9 months ago

Oh... I wonder if the method for getting MaxQueueDepth is broken for your setup. Might be worth a backup of 10 if the fopen/fget fails.

ivanperez-keera commented 9 months ago

I wonder if the method for getting MaxQueueDepth is broken for your setup.

I don't know why that would be. See my docker image: I'm using the standard cFS.

Does that image work for you at all without root?

skliper commented 9 months ago

It's because in your docker geteuid() == 0 so cFS thinks it has privilages and skips the use of the msg_max limit. If you comment out the geteuid !=0 check it worked for me (it IS using PERMISSIVE w/ your setup).

ivanperez-keera commented 9 months ago

That was it. Commenting the if line out (but leaving the following block in) makes cFS not crash. Thanks a bunch!

I wonder 1) if root can go beyond msg_max normally (otherwise, why is there such a check) and, 2) if so, then should there be a flag so that the condition is not solely based on the user id.

ivanperez-keera commented 9 months ago

For completeness if someone has the same I problem I did, this is my current dockerfile:

FROM i386/debian:bullseye

# Apt should not ask questions during configuration
ENV DEBIAN_FRONTEND=noninteractive

# Update packages available
RUN apt-get update

# cFS dependencies
RUN apt-get install -y cmake build-essential gcc-multilib g++-multilib

# Generic dependencies needed
RUN apt-get install -y git

# Get copy of cFS
RUN git clone --recursive https://github.com/nasa/cFS
WORKDIR cFS
RUN git submodule init
RUN git submodule update

RUN cp cfe/cmake/Makefile.sample Makefile
RUN cp -r cfe/cmake/sample_defs sample_defs

# We have to either modify the following file to remove a check based on the
# user ID, or compile and install everything as a different user.
RUN sed -ie '66s/\<if\>/\/\/ if/g' osal/src/bsp/generic-linux/src/bsp_start.c

RUN make SIMULATION=native prep
RUN make
RUN make install
WORKDIR build/exe/cpu1/
CMD ./core-cpu1