No recovery from bash process exiting

PeterNerlich commented 2 years ago

Describe the bug When the bash process exits, for whatever reason, even reloading the page doesn't result in a fresh bash session. Instead, it displays Welcome to Senzing xterm! with a new line, but no $. After a while, the container might display something like this:

Exception in thread Thread-4:
Traceback (most recent call last):
  File "/usr/lib/python3.9/threading.py", line 954, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.9/threading.py", line 892, in run
    self._target(*self._args, **self._kwargs)
  File "/app/app.py", line 131, in read_os_write_socketio
    output = os.read(app.config["file_descriptor"], max_read_bytes).decode()
OSError: [Errno 5] Input/output error

To Reproduce Fortunately, this is pretty trivial

Spin up a container
Open the page in browser
Type in exit and hit enter
See how the process doesn't respond anymore. There are immediate no log messages from the container.
Reload the web page. There is still no new bash session.

Expected behavior Senzing xterm should notice and in some way indicate that the process it was connected to exited. It should spawn and connect to a new one, at least when the page is reloaded.

Desktop:

OS: Arch Linux
Browser chromium
Version 98.0.4758.102 (Official Build) Arch Linux (64-bit)

docktermj commented 2 years ago

@PeterNerlich Are you using this in an orchestration environment like Kubernetes, docker-compose, OpenShift, etc?

PeterNerlich commented 2 years ago

No, I tried it locally with docker run -it --tty --rm -p 8080:5000 senzing/xterm. The container doesn't exit either, and the webpage can be reloaded. It just cannot relay any terminal I/O from/to the process

docktermj commented 2 years ago

@PeterNerlich I've been looking at the code and am coming to the conclusion that it is a flaw in the simplistic design of XTerm.

Particularly this line: https://github.com/Senzing/docker-xterm/blob/92a1c1e724af90e442eb6f2cae843918cadfebc1/rootfs/app/app.py#L187-L188

The net is that each browser connect to the same "pseudo-terminal". So once that pseudo-terminal has exited, there is nothing left to connect to. The fix would be that each browser access needs to start up their own pseudo-terminal.

At the moment, I don't know how complicated this change is. I think it is more than a simple fix. We'll keep the issue open, but I don't see us getting to it in the short term, but hopefully in a few months. I will chat it up among my colleagues to see how complicated the fix would be.

cc: @roncewind @arawinters

docktermj commented 2 years ago

@PeterNerlich And thank you for discovering this and reporting it!!!

PeterNerlich commented 2 years ago

Alternatively, I could imagine python monitoring that process and spawning a new one once it exits. This could be done either immediately, or once a new connection is established.

Given the lines of code you pointed out, the latter might be as easy as checking whether there is actually a running process with the saved pid: https://stackoverflow.com/a/568285 – using that example, making line 187 as simple as if app.config["child_pid"] and check_pid(app.config["child_pid"]):. This should constitute a fairly minimal code change, working exactly as usual as long as the process is still alive. If not, it instead behaves as if this was the very first connection, spawning and setting the pid of a new process.

I still like the thought of some sort of notification in the browser once the shell exits, but that might be slightly more complex.

PeterNerlich commented 2 years ago

Tried that out quickly, didn't work like that. The function calls happen differently than I thought I figured out after all.

docktermj commented 2 years ago

Hi @PeterNerlich!

Over the next couple of days, I'll review and test https://github.com/Senzing/docker-xterm/pull/53 Thank you so much for diving into this!!!

If everything checks out, which I hope it does, we can pull it into the master/main branch. For us to do that you'll have to give us permission to use your code. To do that, the process we have is at https://github.com/Senzing/docker-xterm/blob/master/CONTRIBUTING.md#contributing-code-or-documentation and https://github.com/Senzing/docker-xterm/blob/master/CONTRIBUTING.md#license-agreements

You can attach a signed ICLA or CCLA to this issue. If you want to do it privately, send it to support@senzing.com referencing this issue. (Using support@senzing.com we can discuss things privately, if you desire)

Once again, Thank You!

-- Mike

PeterNerlich commented 2 years ago

Sure, I'll do that soon.

I'm also experimenting further with spawning a new, separate shell process for every new connection and maybe even referencing a particular one to selectively look at and interact with the same process from different tabs. Should I figure it out, that would be a separate pull request, in case you would like to incorporate it as well.

docktermj commented 2 years ago

@PeterNerlich We've been dreaming of that functionality! Just haven't gotten to it. We would welcome that Pull Request!

-- Mike

PeterNerlich commented 2 years ago

I have a quick question regarding 7. of the ICLA:

I used the check_pid() function in my proposed changes, as in I took the (here very few) lines in the example on stackoverflow and pasted them verbatim into app.py, only adding a bit of documentation. That work is technically not my original creation, but published under Creative Commons by stackoverflow on behalf of the person answering, if I understood their model correctly.

What is the working policy here? Do I have to pay special attention here? If not, what are the criteria under which i do have to pay special attention? (e.g. at what size of stackoverflow content I might wish to copy over)

docktermj commented 2 years ago

@PeterNerlich Let me do some checking with some of our folks that know this area more than I do. (I'm just a techie. :-) )

PeterNerlich commented 2 years ago

Actually, maybe even more important: Do I have to disclose my postal address and telephone to the company? I'm still trying to discern how comfortable I am with that. Those might change in the future and it is highly improbably I will remember or care to update the company about it at that time.

And sorry for abusing the issue discussion for that.

docktermj commented 2 years ago

@PeterNerlich No problems with me. I'm chasing down your questions with our staff. May take a little time (a day or two) but we'll figure out that the "fact on the ground" are and go from there. I want you and Senzing to both be comfortable with the outcome.

Feel free to ask any questions of concern to you.

docktermj commented 2 years ago

@PeterNerlich (I'd probably ask the same questions. :smiley: )

docktermj commented 2 years ago

@PeterNerlich

Status

Getting a little push-back because of incompatibility between "Creative Commons" and "Apache 2.0" licensing. I will be talking with our legal team in the near future.

Often, samples seen on the internet show how to do something and then software developers use the insight when writing their own code. Might that be true of what you have done?

PeterNerlich commented 2 years ago

No, I 1:1 copied what are now lines 266–271 from the linked example, as well as line 259. 261 is still largely the same, as I just modified the multiline comment markers (""") to be on their own lines.

This is a longstanding dilemma to which I haven't heard a convincing solution yet; What constitutes original work. If you for example consider complexity or, for better measurability, size of some definition, to decide what counts as original work and what is so simple it is just part of people's lives and not trademarkable, you get companies claiming ownership over specific words that are just normal everyday use, and calling your product "water" and expecting everyone to shut up and only use that word if they mean the product is so obviously ridiculous to most people, but I think I heard claims like that do happen.

In the same way, it feels weird to do that to code, especially with stuff from stackoverflow, where the examples are usually pretty short, because they are supposed to just highlight how to do a certain thing. At this level of simplicity it is genuinely hard to just be inspired by it and not copying it verbatim, because there is no possibility to do it in a simpler way, and any attempt is sure to be an example of bad code because it had to look different.

import os

def check_pid(pid):        
    """ Check For the existence of a unix pid. """
    try:
        os.kill(pid, 0)
    except OSError:
        return False
    else:
        return True

With this example, there is not much that could be changed. import os is just a keyword and the name of the module, this literally cannot change, as well as try:, except OSError: and else:. The name of the function and its parameter could be changed, the parameter could also receive type annotations, but those are not used in the rest of the file and thus I would count it towards going against convention and making the code slightly more confusing and thus "bad". os.kill(pid, 0) could receive some roundabout treatment by assigning 0 to a variable and using that instead, but that's again just bad code. Same with the return statements.

So really, there is no way to change this without it feeling like bad plagiarism instead of directly using the example. The best way I can think of and which would be my proposition for trying to circumvent this legal problem would be to not use a separate function at all and hope that the slightly changed program flow, which would make it look like a very different thing to a non-programmer, is enough to avoid the problems:

try:
    if app.config["child_pid"]:
        os.kill(app.config["child_pid"], 0)
        return
except OSError:
    pass

...but to anyone who knows what this is doing and would directly compare the two approaches, it is very clear that they are exactly equivalent and everything beyond is just unnecessary trickery. Because the adopted example does exactly what we wanted to do, and modifying what it is actually doing is not a solution to our problem.

I hope I don't seem too ramblingly, I really enjoyed this :D

docktermj commented 2 years ago

@PeterNerlich

I've written an email to our legal. I'm asking if your code falls under the scènes à faire doctrine.

docktermj commented 2 years ago

@PeterNerlich

In looking at your Pull Request, I think there's a way to avoid the "Copy/Paste" and improve your code. I say that, because there may be a bug in your code. By the way, I haven't tested it directly.

You'll notice these lines:

https://github.com/PeterNerlich/docker-xterm/blob/011ad5f3b8f9a3c6baef8a4008352d39007eb457/rootfs/app/app.py#L224-L227

Where you want to see if a process is already running. But when check_pid() is called, it kills the process. I don't think that's the behavior you want.

If you code truly wants to check for the existance of a pid, here's an example:

import psutil
pid = 12345
if psutil.pid_exists(pid):
    print("a process with pid %d exists" % pid)
else:
    print("a process with pid %d does not exist" % pid)

Does that help?

PeterNerlich commented 2 years ago

True, that is also one solution and was mentioned in another answer on stackoverflow as well. I didn't choose it because psutils wasn't already imported anyway and it didn't matter.

os.kill(pid, 0) does not kill the process, even if it looks like that and thus maybe is not the most pythonic way to do it. On the other hand, it does perform the equivalent system call, just that the second parameter indicates the signal to send to the process, and 0 is a special case that doesn't actually send a signal, but performs all other checks (most importantly, making sure there is actually a process with that pid).

I actually just trusted the example and didn't run into problems with it, but I went and confirmed my suspicion for this: https://unix.stackexchange.com/questions/169898/what-does-kill-0-do

docktermj commented 2 years ago

I'm amazed that os.kill() doesn't kill How funny!

Actually if I were to create a check_pid() function, it would look like this:

import os

def check_pid(pid):        
    try:
        os.kill(pid, 0)
    except OSError:
        return False
    return True

as it shows more explicitly that there's always something always returned. Feel free to use it.

docktermj commented 2 years ago

@PeterNerlich What's your current thinking on the ICLA? I like, and want to encourage, what you are doing. What can I do to help?

PeterNerlich commented 2 years ago

I'm waiting on an answer to whether I need to put my exact address there. I plan on using this in a lecture experiment the next week and am quite busy for a week after that so I won't add much until then. In terms of help, I don't feel stuck anywhere, but rather like I more or less achieved the functionality I wanted. Testing #53 and #57 and maybe expressing ideas for further tweeks would be nice.

docktermj commented 2 years ago

@PeterNerlich My apologies, didn't realize that was the issue. I have an idea: Fill it out with what you feel comfortable with and send it our way. I'll see if our folks feel its sufficient. Sound reasonable?

PeterNerlich commented 2 years ago

Let's see whether this works out: senzing-individual-contributor-license-agreement-reduced-info.pdf

docktermj commented 2 years ago

@PeterNerlich I got the "go ahead" to accept your ICLA. Yay! I'm a little busy the next 5 workdays, but will probably be able to attend to this right after.

Thank you for your patience!

docktermj commented 2 years ago

@PeterNerlich

I'm testing your pull request (https://github.com/Senzing/docker-xterm/pull/53)

Here's the test case:

Build senzing/installer:2.8.8 using the instructions at https://github.com/Senzing/docker-installer

Run the following:

export SENZING_DOCKER_COMPOSE_YAML=postgresql/docker-compose-rabbitmq-postgresql.yaml

curl -X GET \
    --output /tmp/senzing-versions-latest.sh \
    https://raw.githubusercontent.com/Senzing/knowledge-base/master/lists/senzing-versions-latest.sh

source /tmp/senzing-versions-latest.sh
export SENZING_VOLUME=~/senzing-${SENZING_VERSION_SENZINGAPI}
mkdir -p ${SENZING_VOLUME}

sudo docker run \
    --rm \
    --user 0 \
    --volume ${SENZING_VOLUME}:/opt/senzing \
    senzing/installer:${SENZING_VERSION_SENZINGAPI}

curl -X GET \
    --output ${SENZING_VOLUME}/docker-compose.yaml \
    https://raw.githubusercontent.com/Senzing/docker-compose-demo/master/resources/${SENZING_DOCKER_COMPOSE_YAML}

export SENZING_DATA_DIR=${SENZING_VOLUME}/data
export SENZING_DATA_VERSION_DIR=${SENZING_DATA_DIR}
export SENZING_ETC_DIR=${SENZING_VOLUME}/etc
export SENZING_G2_DIR=${SENZING_VOLUME}/g2
export SENZING_VAR_DIR=${SENZING_VOLUME}/var

export PGADMIN_DIR=${SENZING_VAR_DIR}/pgadmin
export POSTGRES_DIR=${SENZING_VAR_DIR}/postgres
export RABBITMQ_DIR=${SENZING_VAR_DIR}/rabbitmq

sudo mkdir -p ${PGADMIN_DIR}
sudo mkdir -p ${POSTGRES_DIR}
sudo mkdir -p ${RABBITMQ_DIR}

sudo chown $(id -u):$(id -g) -R ${SENZING_VOLUME}
sudo chmod -R 775 ${SENZING_VOLUME}
sudo chmod -R 777 ${SENZING_VAR_DIR}

curl -X GET \
    --output ${SENZING_VOLUME}/docker-versions-latest.sh \
    https://raw.githubusercontent.com/Senzing/knowledge-base/master/lists/docker-versions-latest.sh

source ${SENZING_VOLUME}/docker-versions-latest.sh
export SENZING_DOCKER_IMAGE_VERSION_XTERM=latest

cd ${SENZING_VOLUME}
sudo --preserve-env docker-compose up

When I look at the XTerm log, I see:

2022-04-14 14:17:47,459 senzing-50240101I Senzing X-term version: 1.2.0 updated: 2021-08-30
2022-04-14 14:17:47,459 senzing-50240102I Senzing X-term serving on http://0.0.0.0:5000
2022-04-14 14:18:59,400 senzing-50240103I Connected. PID: None
2022-04-14 14:18:59,402 senzing-50240104I Started background task. PID: 20 Running command: 'bash'

When I look at the first XTerm, I see:

Welcome to Senzing xterm!

but no command prompt.

When I open a second XTerm, I see:

Welcome to Senzing xterm!
2022-04-14 14:18:59,402 senzing-50240103I Connected. PID: None
2022-04-14 14:18:59,403 senzing-50240104I Started background task. PID: 21 Running command: 'bash'

ls
2022-04-14 14:20:13,232 senzing-50240103I Connected. PID: 21
2022-04-14 14:20:15,333 senzing-50240103I Connected. PID: 21
2022-04-14 14:20:16,118 senzing-50240103I Connected. PID: 21

also no command prompt.

Do you get different results?

PeterNerlich commented 2 years ago

Sorry for the radio silence, I didn't get around to testing it with that setup yet. Until now, I've just been running the single container directly from the terminal, without anything else.

I might have to add that I'm not familiar with senzings overall vision at all, I just found this single repo as the closest thing to what I imagined for myself and started hacking away on it.

docktermj commented 2 years ago

@PeterNerlich I think I've fixed the original issue (no recover from bash process exiting) with this code:

https://github.com/Senzing/docker-xterm/blob/23378867054bd2c86b9d725c5ec16171a8e1c72b/rootfs/app/app.py#L227-L234

If this fixed the issue, would you close this issue?

I'll be working on the "multi-session" issue tomorrow. I looked into it today but not in depth.

docktermj commented 2 years ago

@PeterNerlich I'll be closing this issue next week, unless you let me know beforehand that it should be kept open.

docktermj commented 2 years ago

The fix was introduced via https://github.com/Senzing/docker-xterm/pull/74

senzing-garage / docker-xterm

No recovery from bash process exiting #51

Status