ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.79k stars 9.99k forks source link

Could there be any problems when executing youtube-dl code in other than main thread? #5694

Open peetonn opened 9 years ago

peetonn commented 9 years ago

I'm using youtube-dl script in some python environment where I can't execute download on the main thread as it will block the UI. So I'm creating thread whenever I need to fetch video URL (I only need youtube video URL extracted from youtube video webpage URL). What I observe right now - sometimes (1 out of 200-300 requests) everything just freezes as it could happen with mutual thread blocking. I tried to debug it, and it seems that hangs are happening randomly. It may also be the environment in which I'm running this code. So, I'm wondering, is it safe to execute the following code in other than main thread?

import youtube_dl
import time
import threading
import queue
import sys

serviceUrlQ = queue.Queue()
videoUrlQ = queue.Queue()

class MyLogger(object):
    def debug(self, msg):
        global outputMsg
        print("dbg "+msg)
        outputMsg = msg

    def warning(self, msg):
        print("warn "+msg)

    def error(self, msg):
        print(msg)

def getVideoUrl(serviceUrl):  
    print('loading service URL...')
    ydl_opts = {'forceurl':True, 'quiet':True, 'skip_download':True, 'logger': MyLogger(), 'noplaylist':True, 'youtube_include_dash_manifest':False}
    with youtube_dl.YoutubeDL(ydl_opts) as ydl:
        start = time.time()
        ydl.download([serviceUrl])
        end = time.time()
        print("processing time "+str(end-start)+" sec")
        if outputMsg and outputMsg != "":
            videoUrl = outputMsg
            print("retrieved video URL: "+videoUrl)
            return videoUrl
        else:
            raise Exception("can't retrieve video URL")

def workFunc(inQ, outQ):
    # # This will block until a new entry is placed on the queue
    try:
        print('******************* inQ.get')
        serviceUrl = inQ.get(False)
        videoUrl = getVideoUrl(serviceUrl)
        print('****************** outQ.put')
        outQ.put(videoUrl, False)
    except Exception as e:
        print("workFunc(): got exception: %s", e)
        print('****************** outQ.put ERROR')
        outQ.put("error", False)

def getVideoUrlAsync(serviceUrl):
    print("adding service URL to queue...")
    try:
        serviceUrlQ.put(serviceUrl, False)
        workerThread = threading.Thread(target=workFunc, args=(serviceUrlQ, videoUrlQ))
        workerThread.start()
    except Exception as e:
        print('getVideoUrlAsync(): got exception: %s', e)

if __name__ == '__main__':
    url = sys.argv[1]
    getVideoUrlAsync(url)

Brief description: getVideoUrlAsync is called from the main thread and youtube video page url is placed into serviceUrlQ. New thread created, it calls youtube-dl code to fetch video URL and places it into outQ (or videoUrlQ).

jaimeMF commented 9 years ago

I'm not too experienced with threading in python, so I don't know if I can help too much.

Could you post the output when it freezes using {'verbose':True, 'skip_download':True, 'logger': MyLogger(), 'noplaylist':True, 'youtube_include_dash_manifest':False}. It would be interesting to know if it always get stuck in the same place.

You are using a global variable for saving the output (outputMsg), that isn't a good way for handling it. Instead you should use ydl.extract_info(url, download=False), which returns a dictionary with an url key (with that some of the parameters you use are no longer needed).

peetonn commented 9 years ago

the weirdest thing is that it gets stuck in different places, sometimes here, or here, or other places. I tend to blame environment, however can't figure out what could be the problem exactly. Also it could be that youtube-dl is making some non-thread safe calls from other libraries which somehow interferes with the environment.

Thanks for the hint about the extract_info by the way!

dstftw commented 9 years ago

Run it under debugger and issue pause after it's stuck, look at the threads' stack frames to see who has actually stuck and where.