popcornmix / omxplayer

omxplayer
GNU General Public License v2.0
1.02k stars 334 forks source link

OMXPLAYER fails to exit properly and process gets stuck #437

Open horendus opened 8 years ago

horendus commented 8 years ago

Im trying to make a looping audio player

Basically my bash script checks if the OMXPLAYER process is running. If its not it assumes the track has finished playing so it triggers my PYTHON script to run which randomly selects a track and starts OMXPLAYER with that audio file. The python script then exits until called again.

The problem is randomly this system fails because the OMXPLAYER process stops closing down properly.

I know this is the problem because when this happens I can SSH in and run sudo killall -9 /usr/bin/omxplayer.bin

This kills OMXPLAYER and then the next track will immediately start playing because my BASH script will discover the OMXPLAYER process is not running, triggering the python script to run and start a new audio track.

Also note that after this happens ONCE OMXPLAYER will not exit again properly until I reboot the PI. After a reboot the system will work for between 30min - 2 days before falling over again.

This is the command im using to open OMXPLAYER and play the audio file. omxplayer -o local --no-keys "filename"

This is the result of running omxplayer -v Build date: Sat, 06 Feb 2016 16:37:51 +0000 Version : cb91001 [master

Im running the latest version of JESSIE-LITE image on PI2.

Can anyone either suggest a WORK AROUND or know how and why OMXPLAYER doesnt shut down correctly sometimes?

Im considering adding to my batch file a manual kill of the OMXPLAYER process (sudo killall -9 /usr/bin/omxplayer.bin) which is triggered after the duration of the track but im not sure how to get OMXPLAYER to RETURN the length of the current track, or return if its currently playing a file!

Any help would be MUCH appreciated!

pcwalden commented 8 years ago

omxplayer has the same problem on all of my Pis: a B 512MB, Pi2 and Pi3. gpu_mem of the last 2 are definitely over 128. Again using the --no-osd option mitigates the problem.

My reason for wanting a solution is that I use a named pipe as standard input to the omxplayer that runs in a bash shell loop in a background process. I then can have foreground clients login and drop off to raise and lower the volume, or skip tracks by sending keyboard commands through the named pipe.

scottmayo commented 7 years ago

Linux peacock 4.4.34+ #930 Wed Nov 23 15:12:30 GMT 2016 armv6l GNU/Linux (on a Pi2B) Problem still exists on a fresh install done today. The workaround is to add both --no-keys and --no-osd . I'm playing a short (~2-3 second) ,mp3 file through -o local, with "> /dev/null &" so I don't have to wait or be told to have a nice day. Without no-keys it hangs at the start and doesn't proceed until I use fg, which makes some sense. Without the -no-osd it hangs virtually every time at the end - it may have worked the first time.

wmodes commented 7 years ago

Is this still being worked on? Apparently the issue is still open.

Still happening a year later.

500+ videos and the player freezes, sometimes dramatically, leaving beautiful 8-bit rainbow snow glitch patterns. Though usually, less dramatically, with maybe a dozen or less omxplayers that failed to exit completely and are hanging around as active (not zombie) processes.

Any word on this?

popcornmix commented 7 years ago

If someone (perhaps @scottmayo ?) provides a sample file and script that provokes the failure in a reasonable length of time I'll try to debug it.

I've tried:

while : ; do omxplayer Silence01s.mp3 ; done

with a 1 second mp3 file (http://duramecho.com/Misc/SilentCd/Silence01s.mp3) and it doesn't seem to be hanging.

wmodes commented 7 years ago

Try the same thing with a video file?

I'm using subprocess.Popen() to launch it.

-- Wes Modes A Secret History of American River People http://peopleriverhistory.us

Sent from my Apple ][e

On Apr 23, 2017, at 5:18 AM, popcornmix notifications@github.com wrote:

If someone (perhaps @scottmayo ?) provides a sample file and script that provokes the failure in a reasonable length of time I'll try to debug it.

I've tried:

while : ; do omxplayer Silence01s.mp3 ; done with a 1 second mp3 file (http://duramecho.com/Misc/SilentCd/Silence01s.mp3) and it doesn't seem to be hanging.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

scottmayo commented 7 years ago

My use of omx is buried in another piece of software I wrote. I got it to stop hanging, and the command line that works for me is:

nohup /usr/bin/omxplayer --no-osd --no-keys -o local /filename /> /dev/null 2>/dev/null &//

//

And spawned by system(). The comment says //no-osd prevents a hang

The length of the sound didn't seem to affect the probability of hanging if I remember. Dim memory suggests that playing two sounds at once made things worse but I'm not sure.

I've noticed that /tmp ends up with a file containing a pid. That seems odd. Is it there in case someone has to write a script to kill omx? Because it not useful for that because it's still there them the player exits.

On 04/23/2017 08:18 AM, popcornmix wrote:

If someone (perhaps @scottmayo https://github.com/scottmayo ?) provides a sample file and script that provokes the failure in a reasonable length of time I'll try to debug it.

I've tried:

|while : ; do omxplayer Silence01s.mp3 ; done |

with a 1 second mp3 file (http://duramecho.com/Misc/SilentCd/Silence01s.mp3) and it doesn't seem to be hanging.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/popcornmix/omxplayer/issues/437#issuecomment-296439356, or mute the thread https://github.com/notifications/unsubscribe-auth/AYhJ0dXQT3KpkZ-SAphCDclBFnCfsmHhks5ry0GSgaJpZM4H0n8S.

wmodes commented 7 years ago

It was frozen this morning. There were exactly 250 omxplayer.bin defunct or zombie processes. Here's a sample:

root     32732  0.0  0.0      0     0 ?        D    04:23   0:00 [omxplayer.bin]
root     32757  0.0  0.0      0     0 ?        Zl   04:23   0:00 [omxplayer.bin] <defunct>

Here are the messages in the system logs. Not sure which ones are relevant, but nothing else was running on the machine.

[92280.119068] [<80040c58>] (task_work_run) from [<80027af0>] (do_exit+0x348/0xab0)
[92280.119077] [<80027af0>] (do_exit) from [<800282f0>] (do_group_exit+0x4c/0xe4)
[92280.119089] [<800282f0>] (do_group_exit) from [<8003358c>] (get_signal+0x370/0x6dc)
[92280.119103] [<8003358c>] (get_signal) from [<8001360c>] (do_signal+0x278/0x3c0)
[92280.119115] [<8001360c>] (do_signal) from [<8001393c>] (do_work_pending+0xb8/0xd0)
[92280.119127] [<8001393c>] (do_work_pending) from [<8000fb88>] (slow_work_pending+0xc/0x20)
[92400.118782] INFO: task omxplayer.bin:19830 blocked for more than 120 seconds.
[92400.118796]       Not tainted 4.4.50-v7+ #970
[92400.118801] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[92400.118807] omxplayer.bin   D 805b8364     0 19830      1 0x00000005
[92400.118836] [<805b8364>] (__schedule) from [<805b88dc>] (schedule+0x50/0xa8)
[92400.118849] [<805b88dc>] (schedule) from [<805bb7dc>] (schedule_timeout+0x1e8/0x270)
[92400.118860] [<805bb7dc>] (schedule_timeout) from [<805ba524>] (__down+0x88/0xc0)
[92400.118875] [<805ba524>] (__down) from [<80069418>] (down+0x54/0x68)
[92400.118890] [<80069418>] (down) from [<803d180c>] (vchiq_release+0x134/0x31c)
[92400.118902] [<803d180c>] (vchiq_release) from [<80159740>] (__fput+0x94/0x1e4)
[92400.118912] [<80159740>] (__fput) from [<80159900>] (____fput+0x18/0x1c)
[92400.118922] [<80159900>] (____fput) from [<80040c58>] (task_work_run+0xa0/0xd4)
wmodes commented 7 years ago

Here is the Python method I use to start omxplayer. Noteworthy here are:

Typical omxplayer command:

omxplayer --no-osd --no-keys --refresh --aspect-mode stretch --layer 2 --dbus_name org.mpris.MediaPlayer2.omxplayer2 --pos 0.0 media/sharp_shantyboat_moving_away_from_dock.mp4

And here's the method:

omx_cmd = ['omxplayer', '--no-osd', '--no-keys', '--refresh', '--aspect-mode stretch']
content_cmd = omx_cmd + ['--layer %i', '--dbus_name', 'org.mpris.MediaPlayer2.omxplayer%i']
loop_cmd = omx_cmd + ['--layer %i', '--loop', '--dbus_name', 'org.mpris.MediaPlayer2.omxplayer%i']
transition_cmd = omx_cmd + ['--layer %i', '--dbus_name', 'org.mpris.MediaPlayer2.omxplayer%i']

def _start_video(self, video):
        """Starts a video. Takes a video object """
        # add media_dir to filename
        global omx_layer_content, omx_layer_loop, omx_layer_transition, omx_player
        filename = self.media_dir + '/' + video['file']
        # check to make sure we've passed the right thing
        if not isinstance(video, dict):
            raise ValueError(self._example)
        # set video name if we have it
        if 'name' in video:
            name = video['name']
        else:
            name = video['file']
        # skip this video if disabled in db
        if 'disabled' in video and video['disabled']:
            self._debug("Not played:", name, "disabled")
            return
        #debug messages
        self._debug("Starting %s in %s" % (name, self.media_dir))
        self._debug("Video data:", video)
        # get length
        filelength = self._get_length(filename)
        if ('length' not in video or video['length'] == 0.0):
            length = filelength
        else:
            length = video['length']
        # get start
        if 'start' in video:
            start = video['start']
        else:
            start = 0.0
        # if start is too large, set it to 0
        if (start >= filelength):
            start = 0.0
        # if length is too large, scale it back, unless loop
        if (('loop' not in video['tags']) and (start + length >= filelength)):
            length = filelength - start
        # store this for later
        self._current_video = video
        # debugging output
        self._debug("name: %s (%s)" % (name, filename))
        self._debug("tags: %s" % video['tags'])
        self._debug("start: %.1fs, end: %.1fs, len: %.1fs" % \
                    (start, start+length, length))
        # each time we switch to a new video, we switch the layer
        # this will effectively toggle the 3 layer variables between (1,2), (3,4), and (5,6)
        omx_layer_content = 1 if (omx_layer_content != 1) else 2
        omx_layer_loop = 3 if (omx_layer_loop != 3) else 4
        omx_layer_transition = 5 if (omx_layer_transition != 5) else 6
      # we also toggle the virtual player
        omx_player = 1 if (omx_player != 1) else 2
        # build omxplayer command
        if ('loop' in video['tags']):
            my_cmd = " ".join(config.loop_cmd + [filename]) % \
                              (omx_layer_loop, omx_player)
        elif ('transition' in video['tags']):
            my_cmd = " ".join(config.transition_cmd + ['--pos', str(start), filename]) % \
                              (omx_layer_transition, omx_player)
        else: 
            my_cmd = " ".join(config.content_cmd + ['--pos', str(start), filename]) % \
                              (omx_layer_content, omx_player)
        self._debug("cmd:", my_cmd, l=2)
        # launch the player, saving the process handle
        # TODO: after debugging, replace 'if True' with 'try' and enable 'except'
        #if True:
        try:
            process = None
            process = subprocess.Popen(my_cmd, shell=True, preexec_fn=os.setsid, stdin=nullin, 
                                        stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
           # save this process group id
            pgid = os.getpgid(process.pid)
            self._player_pgid = pgid
            self._debug("Starting process: %i (%s)" % (pgid, name))
            # If we have a loop
            if ('loop' in video['tags']):
                self._debug("Looping %.2fs and setting kill timer for %s (pid %i)" %
                            (length - config.inter_video_delay, name, pgid))
            # otherwise
            else:
                self._debug("Waiting %.2fs and setting kill timer for %s (pid %i)" %
                            (length - config.inter_video_delay, name, pgid))
            # wait in a tight loop, checking if we've received stop event or time is over
            start_time = time()
            self._end_time = start_time + length
            # when we get close to the end, we release the thread to start new vid
            while (not self.stopped() and
                   (time() <= self._end_time)):
                pass
            # we kill the old vid
            if process.poll() is None:
                self._stop_video(pgid, name)
            # was starting omxplayer even successful?
            stdoutdata, stderrdata = process.communicate()
            returncode = process.returncode
            # if the process failed, let's log the output
            If (returncode != 0):
                self._debug("Error starting omxplayer for %s\n%s\n%s" % \
                            (name, str(stdoutdata), str(stderrdata)))
        except Exception as e:
             self._debug("Error starting omxplayer for %s\n%s" % (name, str(e)))
torarin commented 7 years ago

Looking at https://github.com/popcornmix/omxplayer/issues/437#issuecomment-219230464 by @jehutting it looks like it is vc_dispmanx_update_submit_sync that is hanging. Does that code look correct to you, @popcornmix? Any reason it could hang?

popcornmix commented 7 years ago

The log with vchiq_release in suggests to me the firmware has crashed. That could result it dispmanx, or openmax hangs.

When this occurs is any subsequent video possible without rebooting? Do commands like vcgencmd version return correctly?

wmodes commented 7 years ago

When I get a video hang -- sometimes with dramatic 8-bit rainbow snow results, but most of the time with just a frozen picture -- I can't reboot at the CLI and need to literally power down.

OMID-313 commented 7 years ago

If you want to play more than one video file in loop (like a playlist), with features like next/previous/etc., I suggest you have a look at omxd (https://github.com/subogero/omxd), a daemon for omxplayer.

Although, the freeze/hang problem of omxplayer still exists, even when using omxd.

wmodes commented 7 years ago

@popcornmix, I know you are busy, but has any progress been made on this? In my case, I am trying to use the pi to present a programmatically generated/randomized playlist for an exhibit in a major museum and would hate to have the player freeze during playback.

As a workaround, I am already having the pi reboot a couple times during non-open hours, which works much of the time, though it still freezes occasionally. When that happens, the pi can't even reboot itself and needs to be power cycled.

Thoughts or suggestions?

exidyboy commented 7 years ago

Hi Wes, Can you create an image of your SD card so other people can test ? Are you unable to run vcgencmd version as per https://github.com/popcornmix/omxplayer/issues/437#issuecomment-296619223 because the terminal in unresponsive ?

wmodes commented 7 years ago

When it dies, the pi is unresponsive. I can't ssh in, and if I'm already ssh'd in the session disconnects. I'll make an image.

mariusmarais commented 7 years ago

I'm also getting this problem when viewing RTSP streams. I'm manually running 2x omxplayer in screen with:

while [ true ]; do omxplayer 'rtsp://xxxx' --live --avdict rtsp_transport:tcp --win "0 0 960 540" --no-keys --no-osd --aidx -1; sleep 3; done

and

while [ true ]; do omxplayer 'rtsp://yyyy' --live --avdict rtsp_transport:tcp --win "960 0 1920 540" --no-keys  --no-osd --aidx -1; sleep 3; done

If one of the cameras drop out, the specific omxplayer usually exists and restarts, but sometimes it freezes. When it freezes, both streams freeze on-screen. It is unclear if the freeze is due to the camera dropping or just happens by itself. Then neither of the omxplayers can be kill -9ed, which I've never seen before, but I am able to SSH in and perform a reboot. The reboot does take a couple of minutes before the Pi's rainbow screen shows, even though the SSH session is closed immediately.

I've only found this thread today, so the --no-osd --aidx -1 options are new to me as of now. Will check back in here if there's improvement or not.

EDIT 2017-05-16: With the new options the player has been running without any hangs. Video on one camera still freezes occasionally, but omxplayer responds to the quit command and the other video has been running for 6 days non-stop. Looks like either --no-osd or muting audio did the trick, probably in combination with --no-keys.

wmodes commented 7 years ago

I'm wrong. I was able to ssh in after hang. Here's what I found in dmesg repeated seven times:

[20400.136025] INFO: task omxplayer.bin:18131 blocked for more than 120 seconds.
[20400.136039]       Not tainted 4.4.50-v7+ #970
[20400.136043] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[20400.136049] omxplayer.bin   D 805b8364     0 18131      1 0x00000005
[20400.136079] [<805b8364>] (__schedule) from [<805b88dc>] (schedule+0x50/0xa8)
[20400.136092] [<805b88dc>] (schedule) from [<805bb7dc>] (schedule_timeout+0x1e8/0x270)
[20400.136104] [<805bb7dc>] (schedule_timeout) from [<805ba524>] (__down+0x88/0xc0)
[20400.136117] [<805ba524>] (__down) from [<80069418>] (down+0x54/0x68)
[20400.136132] [<80069418>] (down) from [<803d180c>] (vchiq_release+0x134/0x31c)
[20400.136144] [<803d180c>] (vchiq_release) from [<80159740>] (__fput+0x94/0x1e4)
[20400.136154] [<80159740>] (__fput) from [<80159900>] (____fput+0x18/0x1c)
[20400.136164] [<80159900>] (____fput) from [<80040c58>] (task_work_run+0xa0/0xd4)
[20400.136175] [<80040c58>] (task_work_run) from [<80027af0>] (do_exit+0x348/0xab0)
[20400.136184] [<80027af0>] (do_exit) from [<800282f0>] (do_group_exit+0x4c/0xe4)
[20400.136196] [<800282f0>] (do_group_exit) from [<8003358c>] (get_signal+0x370/0x6dc)
[20400.136209] [<8003358c>] (get_signal) from [<8001360c>] (do_signal+0x278/0x3c0)
[20400.136221] [<8001360c>] (do_signal) from [<8001393c>] (do_work_pending+0xb8/0xd0)
[20400.136232] [<8001393c>] (do_work_pending) from [<8000fb88>] (slow_work_pending+0xc/0x20)

Followed by

[20520.135692] INFO: task kworker/1:2:132 blocked for more than 120 seconds.
[20520.135705]       Not tainted 4.4.50-v7+ #970
[20520.135710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[20520.135715] kworker/1:2     D 805b8364     0   132      2 0x00000000
[20520.135737] Workqueue: events dbs_timer
[20520.135757] [<805b8364>] (__schedule) from [<805b88dc>] (schedule+0x50/0xa8)
[20520.135769] [<805b88dc>] (schedule) from [<805bb7dc>] (schedule_timeout+0x1e8/0x270)
[20520.135781] [<805bb7dc>] (schedule_timeout) from [<805b941c>] (wait_for_common+0xbc/0x17c)
[20520.135792] [<805b941c>] (wait_for_common) from [<805b94fc>] (wait_for_completion+0x20/0x24)
[20520.135805] [<805b94fc>] (wait_for_completion) from [<8048fbf8>] (rpi_firmware_transaction+0x68/0xac)
[20520.135817] [<8048fbf8>] (rpi_firmware_transaction) from [<8048fd34>] (rpi_firmware_property_list+0xf8/
0x220)
[20520.135828] [<8048fd34>] (rpi_firmware_property_list) from [<8048fec0>] (rpi_firmware_property+0x64/0x8
4)
[20520.135841] [<8048fec0>] (rpi_firmware_property) from [<8046d118>] (bcm2835_cpufreq_clock_property.cons
tprop.1+0x48/0x5c)
[20520.135853] [<8046d118>] (bcm2835_cpufreq_clock_property.constprop.1) from [<8046d178>] (bcm2835_cpufre
q_driver_target_index+0x4c/0xc0)
[20520.135867] [<8046d178>] (bcm2835_cpufreq_driver_target_index) from [<80467744>] (__cpufreq_driver_targ
et+0x1a4/0x2e8)
[20520.135878] [<80467744>] (__cpufreq_driver_target) from [<8046a944>] (od_check_cpu+0xcc/0xd0)
[20520.135889] [<8046a944>] (od_check_cpu) from [<8046c62c>] (dbs_check_cpu+0x1b4/0x1f8)
[20520.135898] [<8046c62c>] (dbs_check_cpu) from [<8046ab84>] (od_dbs_timer+0x68/0xf8)
[20520.135908] [<8046ab84>] (od_dbs_timer) from [<8046c9e8>] (dbs_timer+0x1ac/0x1d0)
[20520.135921] [<8046c9e8>] (dbs_timer) from [<8003c930>] (process_one_work+0x154/0x458)
[20520.135933] [<8003c930>] (process_one_work) from [<8003cc88>] (worker_thread+0x54/0x500)
[20520.135943] [<8003cc88>] (worker_thread) from [<80042954>] (kthread+0xec/0x104)
[20520.135955] [<80042954>] (kthread) from [<8000fbe8>] (ret_from_fork+0x14/0x2c)

Or looking at the whole log:

pi@exhibit:~ $ dmesg | grep "INFO: task"
[19680.137474] INFO: task omxplayer.bin:18131 blocked for more than 120 seconds.
[19800.137344] INFO: task omxplayer.bin:18131 blocked for more than 120 seconds.
[19920.137058] INFO: task omxplayer.bin:18131 blocked for more than 120 seconds.
[20040.136808] INFO: task omxplayer.bin:18131 blocked for more than 120 seconds.
[20160.136538] INFO: task omxplayer.bin:18131 blocked for more than 120 seconds.
[20280.136283] INFO: task omxplayer.bin:18131 blocked for more than 120 seconds.
[20400.136025] INFO: task omxplayer.bin:18131 blocked for more than 120 seconds.
[20520.135692] INFO: task kworker/1:2:132 blocked for more than 120 seconds.
[20520.136020] INFO: task kworker/3:0:8138 blocked for more than 120 seconds.
[20520.136150] INFO: task kworker/2:0:15672 blocked for more than 120 seconds.
wmodes commented 7 years ago

Okay, the problem is still happening. Even with an every 6 hour reboot. So the problem continues.

Here's a zip of the 32GB sd card image. Anything else you need @popcornmix?

https://drive.google.com/file/d/0B1PMX6FKdplmOXRndm9UeUpheE0/view?usp=sharing

I made it with

sudo dd if=/dev/rdisk2 bs=1m | gzip > 2017-05-19-exhibit_pi.img.gz

You should be able to extract it with

gzip -dc 2017-05-19-exhibit_pi.img.gz | sudo dd bs=1m of=/dev/rdisk2

(or whatever disk you are dd'ing it to)

The login is pi:raspberry. I usually ssh in since the console is playing omxplayer in fullscreen mode.

It should start the python script automatically that threads omxplayer. The code is in ~pi/exhibitvideo and on github at https://github.com/wmodes/exhibitvideo

exidyboy commented 7 years ago

On 19/05/2017, at 3:10 PM, Wes Modes wrote:

Okay, the problem is still happening. Even with an every 6 hour reboot. So the problem continues.

Here's a zip of the 32GB sd card image. Anything else you need @popcornmix?

https://drive.google.com/file/d/0B1PMX6FKdplmOXRndm9UeUpheE0/view?usp=sharing

I made it with

sudo dd if=/dev/rdisk2 bs=1m | gzip > 2017-05-19-exhibit_pi.img.gz

You should be able to extract it with

gzip -dc 2017-05-19-exhibit_pi.img.gz | sudo dd bs=1m of=/dev/rdisk2

(or whatever disk you are dd'ing it to)

Went to test tonight but unfortunately all my cards are official Raspberry Pi Foundation 8G cards and I get a kernel panic trying to boot your 32GB image. Sorry I don't have the ninja Linux skills to resize your .img to fit any of my cards without actually writing it to the same size card. It looks like you're using n00bs though - I would steer clear it for future project such as this - it makes too many weird assumptions - although unlikely to anything to do with your current problems. Happy to try again to duplicate your issue if you can be bothered making an 8GB or smaller image.

wmodes commented 7 years ago

Normally, I go with the Raspian image, but someone gave me this card and it had n00bs and I went with it. I'll see if it can resize to 8GB. Thanks for trying it.

d3vgru commented 7 years ago

I am seeing the same issue running Raspbian 8.0. My omxplayer version is dfea8c9

jehutting commented 7 years ago

@d3vgru How does your command line look like? Are you using python/bash to call it? Able to share file/code?

jehutting commented 7 years ago

@wmodes First of all, thanks for sharing the image and your project!

Last weekend I bought a 32 GB SD card. Unluckily it has a 31.2 GB capacity whereas your image needs 31.4 GB... Did give it a dd try, ignored the too small message but also got a kernel panic. I decided to recover the image to Raspbian and got a clean 32GB image. With sudo losetup -o 1279262720 /dev/loop1 2017-05-19-exhibit_pi.img I could grab your code and files. Not exactly the same image as yours but a bootable and executable exhitbitvideo (Raspberry 3B) testrig.

screenshot_2017-05-30_07-25-44

Sad thing is, it is now already up-and-running for 72 hours without an issue :-(

As you can see from the screenshot the processor is running almost on a constant 100% load. You could replace your pass statement in the while until the end time reached loopings by a sleep(0.1) to lower it down.

From the code I guess that you only need the end timer for the loop sequence step. If this is true and using the end-timer only for that purpose, the non-looping files would be fully played (maybe your playfull tag?) as of now these files are simply cut-off (leading to the Error starting omxplayer for xxx log messages.)

Multiple hanging omxplayers.bin: that would be possible if you send a kill only to the script omxplayer and not to omxplayer.bin. Normally I would kill omxplayer.bin which would terminate the script. You use killpg; had to look that up and as far as I understood the command you are good. Maybe in wait_for_end the end time is detected earlier than the check in the start_video function, and therefore having a race between actual killing omxplayer and (garbage?) cleaning up the content_thread in main? I don't know.

Your /var/log/messafes logging shows multiple 2017-05-15 13:18:31,451 - exhibitvideo.py - INFO - debug: main: Encountered exception: can't start new thread Maybe that it is caused by the defunctional omxplayer.bins, running out of memory?

The log file shows also other strange things

2017-05-15 22:17:07,090 - exhibitvideo.py - INFO -   debug: _start_video: start: 0.0s, end: 74.8s, len: 74.8s
2017-05-15 22:17:07,090 - exhibitvideo.py - INFO -   debug: _start_video: cmd: omxplayer --no-osd --no-keys --refresh --aspect-mode stretch --layer 2 --dbus_name org.mpris.MediaPlayer2.omxplayer2 --pos 0.0 media/secret_history_dorris_turner_what_does_the_river_mean_to_you.mp4
2017-05-15 22:17:07,189 - exhibitvideo.py - INFO -   debug: _start_video: Starting process: 641 (secret_history_dorris_turner_what_does_the_river_mean_to_you.mp4)
2017-05-15 22:17:07,190 - exhibitvideo.py - INFO -   debug: _start_video: Waiting 74.52s and setting kill timer for secret_history_dorris_turner_what_does_the_river_mean_to_you.mp4 (pid 641)
2017-05-18 19:00:47,027 - exhibitvideo.py - INFO - debug: _stop_video: Sending SIGTERM to process 641 (secret_history_dorris_turner_what_does_the_river_mean_to_you.mp4)
2017-05-18 19:00:47,029 - exhibitvideo.py - INFO -     debug: main: Next recipe (2): loop, duration 60.00s, 8 choices
2017-05-18 19:00:47,030 - exhibitvideo.py - INFO -     debug: main: Selected film: {'length': 60, 'file': 'gentle_waves_and_birdsong_loop.mp4', 'tags': ['loop']}
2017-05-18 19:00:47,031 - exhibitvideo.py - INFO - debug: _start_video: Starting gentle_waves_and_birdsong_loop.mp4 in media
2017-05-18 19:00:47,031 - exhibitvideo.py - INFO -   debug: _start_video: Video data: {'length': 60, 'file': 'gentle_waves_and_birdsong_loop.mp4', 'tags': ['loop']}

and

2017-05-15 22:17:07,036 - exhibitvideo.py - INFO -   debug: _start_video: tags: ['interview']
2017-05-15 22:17:07,039 - exhibitvideo.py - INFO -   debug: _start_video: start: 0.0s, end: 40.6s, len: 40.6s
2017-05-15 22:17:07,040 - exhibitvideo.py - INFO -   debug: _start_video: cmd: omxplayer --no-osd --no-keys --refresh --aspect-mode stretch --layer 2 --dbus_name org.mpris.MediaPlayer2.omxplayer2 --pos 0.0 media/secret_history_ken_lubinski_respect_the_river.mp4
2017-05-15 22:17:07,439 - exhibitvideo.py - INFO -   debug: _start_video: Starting process: 643 (secret_history_ken_lubinski_respect_the_river.mp4)
2017-05-15 22:17:07,444 - exhibitvideo.py - INFO -   debug: _start_video: Waiting 40.33s and setting kill timer for secret_history_ken_lubinski_respect_the_river.mp4 (pid 643)
2017-05-18 21:04:57,088 - exhibitvideo.py - INFO -     debug: main: Next recipe (2): loop, duration 60.00s, 8 choices
2017-05-18 21:04:57,089 - exhibitvideo.py - INFO -     debug: main: Selected film: {'length': 60, 'file': 'walking_on_the_frozen_river_loop.mp4', 'tags': ['loop']}
2017-05-18 21:04:57,089 - exhibitvideo.py - INFO - debug: _stop_video: Sending SIGTERM to process 643 (secret_history_ken_lubinski_respect_the_river.mp4)
2017-05-18 21:04:57,090 - exhibitvideo.py - INFO - debug: _start_video: Starting walking_on_the_frozen_river_loop.mp4 in media

Don't know what to think about these lines. Doesn't make any sense to me.

I made some changes to videothread.py.zip and running it on another Raspberry 3B.

I really like the sequence idea of exhitbitvideo (and also the idea behind your project peoplesriverhistory.us). What I don't like is the bacon_frying_loop video; that one makes me hungry :-)

d3vgru commented 7 years ago

@jehutting It's a simple bash script:

!/bin/bash

omxplayer -o local testfile.wav

This is called by a systemd service:

[Unit] Description=Audio Player

[Service] Type=simple User=pi WorkingDirectory=/home/pi ExecStart=/home/pi/play.sh StandardOutput=syslog StandardError=syslog SyslogIdentifier=play Restart=always

[Install] WantedBy=multi-user.target

I originally had play.sh call omxplayer in an infinite loop. I decided to remove the loop, at which point I could see that the service would restart properly at the end of playback for a while (as short as 90 minutes, as long as 6 hours in one case). However, at some point omxplayer would not exit at the end of playback. Thus systemd would not restart the service, since it appeared to still be running.

For comparison, I replaced omxplayer with aplay and it ran for over 12 hours before I stopped the test. My audio file is 4m02s long. I can't upload the file because it is for a copyrighted project.

jehutting commented 7 years ago

@d3vgru Thanks. Will stop exhibitvideo -after 90 hours of up-and-running- and give your method a try. Are you able to run the script with the --no-osd (and maybe with the --no-keys) as to verify these options are still a kind of work-around of this issue?

d3vgru commented 7 years ago

@jehutting I would be glad to. I will let you know sometime tomorrow if --no-osd helps, and will try again with --no-keys if --no-osd doesn't seem to matter.

d3vgru commented 7 years ago

@jehutting Adding --no-osd to the command line seems to have worked

jehutting commented 7 years ago

Thanks @d3vgru for the check.

I ran your method - without the --no-osd - for almost 40 hours before I stopped it. Currently running my infinity looper script; as of now it made 2879 runs. No issue with my 'wmodes' image so far.

wmodes commented 7 years ago

Hey all. Thanks for looking at the problem I was experiencing. I hopt the image/project will help others solve this issue. I am just now getting to examining the many posts you've made since I posted the image.

TL;DR: It feels like a memory issue, because of the glitchy way it is failing. Even before it freezes, I get strange colored artifacts across the video. A memory leak either in my code or in omxplayer.

@jehutting, thanks for heroic efforts to make an approximate dupe version. The idea of the exhibit player is that instead of merely playing a sequence, it has a recipe, choosing randomly from certain tags, interspersing different types of video (interviews, transitions, scenic, etc) in a loose looped recipe.

I didn't get around to resizing my image, did you eventually get it to work from my SD image? If you aren't having the problem, does that imply that it is either my version of the OS or my hardware? Personally, I suspect my code.

Looking at the end timer: The loop in videothread._start_video() is a legacy of an earlier version that could handle a playlist and would thus start the next video a moment before the previous one ended to prevent the video start lag (thus the need for threaded version). Now, the main loop of exhibitvideo uses the videothread.wait_for_end() method instead. Perhaps I should remove the loop in _start_video() altogether. But as I understand it, I can't determine whether omxplayer was successful until after I'm sure it ended using process.communicate().

For your version that didn't freeze, were you running a modified one with a small delay added to the tight timing script? If so, how did you modify it that it didn't freeze?

Sounds like you were saying that I'm cutting off the videos? I have the kill to prevent omxplayers from turning into zombies. But perhaps you have a better suggestion for this code? The loopers clearly need to be killed, but will the non-looping omxplayers die gracefully on their own?

            # wait in a tight loop, checking if we've received stop event or time is over
            start_time = time()
            self._end_time = start_time + length
            # when we get close to the end, we release the thread to start new vid
            while (not self.stopped() and
                   (time() <= self._end_time)):
                pass
            # we kill the old vid
            if process.poll() is None:
                self._stop_video(pgid, name)  

I created the omxplayer with subprocess.Popen(..., shell=True, ...) to give me a process group so I could successfully kill the entire process group as needed.

            process = subprocess.Popen(my_cmd, shell=True, preexec_fn=os.setsid, stdin=nullin, 
                                        stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
           # save this process group id
            pgid = os.getpgid(process.pid)
            self._player_pgid = pgid

Did you have another suggestion for killing omxplayer?

    def _stop_video(self, pgid, name):
        self._debug("Sending SIGTERM to process %i (%s)" % (pgid, name))
        try:
            os.killpg(pgid, signal.SIGTERM)
            self._player_pgid = None
            self._current_video = None
        except OSError, e:
            self._debug("Couldn't terminate %s (pid %i)\n%s" % (name, pgid, str(e)))
            pass

Note, that videothread.wait_for_end() doesn't kill omxplayer so they shouldn't have a race. wait_for_end() is just a courtesous method to provide a way to run video synchronously.

Lastly, the logs you included look normal to me, but I know what they are saying. Basically, I set up the debug version of the logs to tell what it is doing/thinking at each pass to another part of the program.

jehutting commented 7 years ago

@wmodes I gparted my 32GB disk in the same way as your NOOBS image, except the root partition to the max of my 31.2 GB card. Than I dded your image onto the card. As this lead to the kernel panics and therefore unusable, I let Raspbian recover the image. So I got a clean image. The next step I did was copy your (unmodified) exhibitvideo code and files, and rc.local for the autorun. With it I ran your exhibitvideo without any issue.

There is nothing wrong (as far as I understood your code, and apart from what I already described in my previous comment) with the thread opening and killing omxplayer. With process.poll() you know if omxplayer has fully played the video and has been terminated. Only in case of looping you check for the end time, and when elapsed actually need to kill omxplayer. That's what I tried with below code snippet

            while not self.stopped():
                if (looping and 
                    time() > self._end_time):
                    break
                # wait for completion
                if process.poll() is not None:
                    break
                sleep(0.1) # a little nap; prevents processor running at 100%

The videothread.wait_for_end() runs in the main thread and therefore its end time check can end the waiting before the videothread has detected the end time. Now it could be that the content_thread is disposed (abruptly stopped) before it has a chance to kill omxplayer, leaving you the defunct omxplayer.bins. I know it is a wild guess. If your _start_video ends up with a self.stop() and let

    def wait_for_end(self):
        """Wait for end of video in tight loop"""
        while not self.stopped():
            sleep(0.1)

this should exclude the posibility of the race condition.

The logging

2017-05-15 22:17:07,190 - exhibitvideo.py - INFO -   debug: _start_video: Waiting 74.52s and setting kill timer for secret_history_dorris_turner_what_does_the_river_mean_to_you.mp4 (pid 641)
2017-05-18 19:00:47,027 - exhibitvideo.py - INFO - debug: _stop_video: Sending SIGTERM to process 641 (secret_history_dorris_turner_what_does_the_river_mean_to_you.mp4)

is not normal; look at the date and time. But it could be that the file is corrupt.

Colored artifacts? Wouldn't be surprised if the abrupt killing of omxplayer is causing that. Give my modified videothread.py a try. Let's see how that behaves on your image.

I guess you should start with a new clean image. Another thing, try to set the gpu_mem to 128 MB. It is now 64 MB.

wmodes commented 7 years ago

Thanks for all that hard work. I will try all of that this Sunday.

Any other suggested improvements to the code are much appreciated. I hope other artists and exhibit designers have the opportunity to use the Pi in combination with the exhibit code.

wmodes commented 7 years ago

@jehutting, looking over my code, I implemented some of your suggestions. I also adding comments to better clarify my intention and understanding. Appreciate any corrections or misconceptions I have. I have only implemented a few threaded programs.

I know this seems a far cry from the omxplayer issue I originally reported on, but if we can determine it is my script and not omxplayer, we can put to bed my report of an omxplayer bug.

First the relevant part of the _start_video method:

            # generate expected _end_time (now + length)
            start_time = time()
            self._end_time = start_time + length
            # wait until the end times (but also allowing that the stop flag is triggered)
            while ((time() <= self._end_time) and
                    not self.stopped()):
                sleep(0.1)
            # emerging from this loop, either
            #   1) player ended gracefully
            #   2) the video is looped, and so needs to be stopped
            #   3) the thread received a stop order, and so video needs to be killed
            #   4) something else, like player hung
            # In any case, we kill the process group of the video, if we can
            if process.poll() is None:
                self._stop_video(pgid, name)    
            # Now that we have ended one way or the other, we should be able to get stdout/stderr
            # Report if we had any problems starting omxplayer
            stdoutdata, stderrdata = process.communicate()
            returncode = process.returncode
            # if the process failed, let's log the output
            if (returncode != 0):
                self._debug("Error starting omxplayer for %s\n%s\n%s" % \
                            (name, str(stdoutdata), str(stderrdata)))

Now, the modified wait_for_end() method:

    def wait_for_end(self):
        """Wait for end of video in tight loop. This provides a synchronous mechanism to wait
        for the end of a video."""
        #
        # The run() method is called asynchronously, so it is possible to call this 
        # method before run() has recorded the expected _end_time. 
        # thus we wait until the end times (but also allowing that the stop flag is triggered)
        while (not self._end_time and
                not self.stopped()):
            sleep(0.1)
        #self._debug("Waiting for end of video")
        # now wait until time expires (or we are interrupted because stop flag is triggered)
        while ((time() <= self._end_time) and
                not self.stopped()):
            sleep(0.1)

Adding the sleep(0.1) brings the load in top from 150% to %10. I have the program running a series of videos, and will see how it does over the next several hours.

wmodes commented 7 years ago

It has been running continuously for 48 hours. So far no glitches or hangs or zombie omxplayer processes. The only serious change was to add a sleep(0.1) to my tight loop. Apparently, when the Pi runs at 100% for steady periods, omxplayer is susceptible to glitches and hangs. I will keep burning in the project and see what happens.

popcornmix commented 7 years ago

@wmodes Are you overclocking? Is this Pi1/Pi2?Pi3? How is the power supply?

wmodes commented 7 years ago

Not overclocking Pi3. 2.5A supply.

wmodes commented 7 years ago

Four days: No glitches. No hangs.

exidyboy commented 7 years ago

On 12/06/2017, at 4:32 AM, Wes Modes wrote:

Four days: No glitches. No hangs.

Great stuff Wes ! Thanks for keeping us in the loop ;-)

slappymcphee commented 7 years ago

Reading through all of this I have to wonder (since I am so new to omxplayer and not too great with linux just yet) if this ties into an issue that I am seeing with my Pi Zero W when using video previews in EmulationStation. I notice that no matter how I encode my mp4 files (res, fps/bitrate, audio embedded/pass-thru, etc) that when a video plays my CPU maxes out during that time and then when I exit back to say the system carousel screen the CPU still stays pegged at 100%. The only way that it drops is if either A) the screensaver kicks in (set to dim), B) I enter into an emulator, or C) reboot the system. Also if I just restart ES the CPU stays pegged. @wmodes is it possible that you modifications may assist me? Do I need to start a new issue of my own? I am also trying ascertain how to get that CPU utilization to lower so that temps down steadily climb. Thanks in advance for your input!

wmodes commented 7 years ago

Unfortunately, my mods were in my own code, not omxplayer. Omxplayer may be vulnerable to high CPU and low mem availability, but probably not the cause of it.

On Mon, Jun 19, 2017 at 8:05 PM, fnkngrv notifications@github.com wrote:

Reading through all of this I have to wonder (since I am so new to omxplayer and not too great with linux just yet) if this ties into an issue that I am seeing with my Pi Zero W when using video previews in EmulationStation. I notice that no matter how I encode my mp4 files (res, fps/bitrate, audio embedded/pass-thru, etc) that when a video plays my CPU maxes out during that time and then when I exit back to say the system carousel screen the CPU still stays pegged at 100%. The only way that it drops is if either A) the screensaver kicks in (set to dim), B) I enter into an emulator, or C) reboot the system. Also if I just restart ES the CPU stays pegged. @wmodes https://github.com/wmodes is it possible that you modifications may assist me? Do I need to start a new issue of my own? I am also trying ascertain how to get that CPU utilization to lower so that temps down steadily climb. Thanks in advance for your input!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/popcornmix/omxplayer/issues/437#issuecomment-309631737, or mute the thread https://github.com/notifications/unsubscribe-auth/ADXfBme4Zc-428h8SOL7CuXxC787NSCDks5sFzbggaJpZM4H0n8S .

-- Wes Modes Artist modes.io

slappymcphee commented 7 years ago

thank's for the input.

mikeism commented 7 years ago

Same issue here.... Extremely simple use of omxplayer used to "make a sound" for 5 seconds.

Here omxplayer is being used as the sound end of a remote "door bell". The omxplayer machine is running jessie lite headless. The system is using Apache2 (this avoids most firewalls and was real fast to get up and running).

In a nutshell: To "ring" the bell, the initiating machine uses wget to poke at the apache2 server on the omxplayer machine. This prompts apache to call a php script. In turn php calls a bash shell script. Here is the meat of the shell script:

!/bin/bash

nohup omxplayer -o local ring.wav <&- >&- 2>&- & disown

The disowned background process should make everything return and clean up very quickly. Leaving only the omxplayer to finish its sound and fury and die. This all works as planned except for that pesky part of omxplayer dying when done.

After a reboot the setup always works once. Sometimes more than once. But not too long after it stops working. Using "ps -ef" shows dozenS of omxplayer's still hanging around, all owned by init (the ownership is as expected), and i suspect hogging a resource. But, these processes should have died long ago. Some may have died, most have not. And they won't even die with kill -9, which i find strange.

Granted my setup is a bit of a Rube Goldberg, but each part is very small, self contained, easy to test. So it is very clear that the offending issue is that the omxplayer refuses to die when it should.

This system is very under used. In the steady state the load average is about 0.00, and uses only about 200M of ram, leaving 800M of ram free.

It does appear to fix the problem by changing the above line to: nohup omxplayer --no-keys --no-osd -o local ring.wav <&- >&- 2>&- & disown

I thank this thread for the workaround, but such a workaround should not be needed.