phoenix-rtos / phoenix-rtos-kernel

Phoenix-RTOS microkernel repository
http://phoenix-rtos.com
BSD 3-Clause "New" or "Revised" License
114 stars 33 forks source link

Spawn of multiple processes through `syspage` fails. #208

Open gerard5 opened 3 years ago

gerard5 commented 3 years ago

Description

Spawning multiple processes through syspage fails, leading to the number of instances being multiplied to the power of two (see the attached screen shots). I've checked syspage content forwarded from plo to the kernel and it looks ok, thus it seems an issue of kernel process spawn. I have verified this only on armv7m7-imxrt106x target so it may be NOMMU specific and may be observed on armv7m7-imxrt117x too.

This issue may be linked with Jira tasks ([RTOS-1] multiple sysexec usage fails) and [NIL-20], although I've not verified psh sysexec behavior, but it looks similar.

Simple program which may be used to reproduce:

#include <stdio.h>
#include <unistd.h>
#include <sys/msg.h>

int main(int argc, char *argv[])
{
    oid_t oid;

    /* wait for dummyfs & imxrt-multi */
    while (lookup("/dev", NULL, &oid) < 0)
        sleep(1);

    printf("spawn test id %s\n", argc == 2 ? argv[1] : "(no id)");

    for (;;)
        sleep(1);

    return 0;
}

Aliases are common to all of the cases below, and they are the following:

alias phoenix-armv7m7-imxrt106x.elf 0x11000 0xf600
alias dummyfs 0x20600 0x8c00   
alias spawn_test 0x29200 0x4000
alias imxrt-multi 0x2d200 0x7e00
alias psh 0x35000 0x19800      

One instance:

app flash1 -x dummyfs xip1 dtcm
app flash1 -x imxrt-multi xip1 dtcm
app flash1 -x psh xip1 ocram2
app flash1 -x spawn_test;1 xip1 ocram2

one

Two instances:

app flash1 -x dummyfs xip1 dtcm
app flash1 -x imxrt-multi xip1 dtcm
app flash1 -x psh xip1 ocram2
app flash1 -x spawn_test;1 xip1 ocram2
app flash1 -x spawn_test;2 xip1 ocram2

two

Three instances:

app flash1 -x dummyfs xip1 dtcm
app flash1 -x imxrt-multi xip1 dtcm
app flash1 -x psh xip1 ocram2
app flash1 -x spawn_test;1 xip1 ocram2
app flash1 -x spawn_test;2 xip1 ocram2
app flash1 -x spawn_test;3 xip1 ocram2

three

Four instances:

app flash1 -x dummyfs xip1 dtcm
app flash1 -x imxrt-multi xip1 dtcm
app flash1 -x psh xip1 ocram2
app flash1 -x spawn_test;1 xip1 ocram2
app flash1 -x spawn_test;2 xip1 ocram2
app flash1 -x spawn_test;3 xip1 ocram2
app flash1 -x spawn_test;4 xip1 ocram2

four

Five instance:

app flash1 -x dummyfs xip1 dtcm
app flash1 -x imxrt-multi xip1 dtcm
app flash1 -x psh xip1 ocram2
app flash1 -x spawn_test;1 xip1 ocram2
app flash1 -x spawn_test;2 xip1 ocram2
app flash1 -x spawn_test;3 xip1 ocram2
app flash1 -x spawn_test;4 xip1 ocram2
app flash1 -x spawn_test;5 xip1 ocram2

five

Above and including four spawned instances system stopped responding nether psh was working nor imxrt-multi console input. I tried different combination of maps although using (imap=ocram2, dmap=ocram2) any time it looked the same.

gerard5 commented 3 years ago

The cmdline is a string before parsing (received directly from the plo in the syspage):

"Xdummyfs Ximxrt-multi Xpsh Xspawn_test;1 Xspawn_test;2 Xspawn_test;3 Xspawn_test;4 Xspawn_test;5 "

after parsing, the memory content looks like this:

Xdummyfs\0Ximxrt-multi\0Xpsh\0Xspawn_test\0Xspawn_test\0Xspawn_test\0Xspawn_test\0Xspawn_test\0

but whileis being parsed each of the cmdline item is scanned through prog=syspage->progs list and compared with prog->cmdline. This is a serious problem if cmdline contains multiple programs with the same name, with different or same arguments or without arguments at all. As in the example in the problem description (see screenshots above), these five commands with arguments: spawn_test;1, spawn_test;2,spawn_test;3, spawn_test;4,spawn_test;5, lead to 25 processes spawned… what !?

Take a close look at the block with hal_strcmp() if-statement inside the loop: https://github.com/phoenix-rtos/phoenix-rtos-kernel/blob/54c5d2c61dd3f474aed1dd376967a5590d2de5a1/main.c#L97-L105

What it actually does, for each processed cmdline item, it scans through prog=syspage->progs, and if cmdline+1 matches the prog->cmdline spawns a single process (for now I'm omitting the performance of this scan loop), it's even worse when the same program name appears more than once (this is the topic of an issue), leads to multiple processes spawned to ^2

As a redesign of the syspage is not the subject of this issue, the temporary solution is to somehow mark the already spawned prog->cmdine program, to be skipped in next syspage->progs loop scan.

Not judging the solution itself, the below temporary hack seems to work:

u32 skips = 0; /* bit index */

In plo the MAX_PROGRAMS_NB is set to 32 so uint32 is ok here, though rootfs-less projects should not go above the limit if so it is a sign to have rootfs, with which we agree, I suppose.

            for (prog = syspage->progs, i = 0; i < syspage->progssz; i++, prog++) {
                if (!(skips & (1u << i)) && !hal_strcmp(cmdline + 1, prog->cmdline)) {
                    skips |= 1u << i;
                    argv[0] = prog->cmdline;
                    res = proc_syspageSpawn(prog, vm_getSharedMap(prog), prog->cmdline, argv);
                    if (res < 0) {
                        lib_printf("main: failed to spawn %s (%d)\n", argv[0], res);
                    }
                    break;
                }
            }

The result is as expected:

sol1

So, if there are better solutions that You guys have let's discuss them or if You agree with the presented approach let me prepare PR and commit.