default "timeout" not working

kucingkembar commented 7 months ago

hi, sorry for my bad English, i have this problem : the program that i run is stuck "Speed: 0.00 MB/s, ETA: 99:59:59 " for hours, then i tried this : "timeout: (number or tuple, Optional) A number, or a tuple, indicating how many seconds to wait for the client to make a connection and/or send a response. Default is 20 seconds." but somehow this do not work in python script, only work in API i don't understand how to use this API, can you give me tutorial how to use "timeout" feature?

mjishnu commented 7 months ago

hey,

you can pass timeout to downloader object

dl = downloader(timeout=10)
dl start(url)

regarding why default timeout is not working, it is probably because of a bug, when you pass any keyword argument like header, proxies or others to downloader object the default values gets overwritten this is a known bug, i forgot to push the patch for it, will fix it soon sorry for the inconvenience.

kucingkembar commented 7 months ago

thanks for the reply, i do what you type:

dl = downloader(timeout=10)
dl start(URL)

after some hours of testing, it still stuck, but the ratio is reduced

mjishnu commented 7 months ago

Is it printing any error to the console, or is it just stuck if so then most probably your internet connection has issue or the server which is serving you data has some issue, if you don't wanna be stuck try reducing the timeout to 5 or something. you can read more about timeout from here

kucingkembar commented 7 months ago

i understand about timeout in requests, it is based on how many seconds spend, but your timeout" seems to have different work, it is based on a cycle between print progress bar and speed + ETA,

about "internet connection has issue or the server which is serving you data has some issue". yes, I have a problem with it and hopefully your "software" that is capable of resuming interrupted downloads will fix it

mjishnu commented 7 months ago

timeout work same way as it works in request, basically under the hood the timeout parameter is passed into requests module and if the request module produces a time out error it is bubbled back to the main function. the speed and eta has nothing to do with it they are just calculated values and doesn't influence the timeout

As i suspected if the server or your internet is not stable this can cause it to get stuck the solution is to just reduce the value of timeout untill a satisfactory result is found

kucingkembar commented 7 months ago

i set the timeout 30, but the program still stuck "Speed: 0.00 MB/s, ETA: 99:59:59" for hours, can you just add your own timeout?

mjishnu commented 7 months ago

reduce the timeout not increase it, if its 30 then it doesn't mean 30sec it can sometimes be more than 30sec. you can read more about timeout from here

kucingkembar commented 7 months ago

mate sorry for the problem, i think this conversation not go anywhere i will mark it as complete

mjishnu commented 7 months ago

sry if i wasn't of much help slow speed is locality related issue can't do much. though if your speed is generally fast and its slow on particular site/URL only then try setting user agent or if its ip based try using poxy or vpn might fix the slow speed

dl = downloader(timeout=3, header={"User-Agent":useragent}, proxies=proxies)
dl start(url)

kucingkembar commented 7 months ago

somehow i mod your code to "fix" it, main.py :

def _display(self, multithread, interval):
        download_mode = "Multi-Threaded" if multithread else "Single-Threaded"
        with output(initial_len=2, interval=interval) as dynamic_print:
            #start here
            trying = 0
            maxtry = 200
            #200 = 30 seconds / 0.15 second
            while True:
                if int(self.speed) == 0:
                    trying = trying + 1
                    if trying > maxtry:
                        import subprocess
                        subprocess.call('taskkill /IM python.exe /F')
                        subprocess.call('taskkill /IM python.exe /F')
                else:
                    trying = 0
            #end here                    
                if self.size != inf:
                    progress_bar = f"[{'█' * self.progress}{'·' * (100 - self.progress)}] {self.progress}%"
                    progress_stats = f"Total: {to_mb(self.size):.2f} MB, Download Mode: {download_mode}, Speed: {self.speed:.2f} MB/s, ETA: {self.eta}"
                    dynamic_print[0] = progress_bar
                    dynamic_print[1] = progress_stats
                else:
                    download_stats = f"Downloaded: {to_mb(self.downloaded):.2f} MB, Download Mode: {download_mode}, Speed: {self.speed:.2f} MB/s"
                    dynamic_print[0] = "Downloading..."
                    dynamic_print[1] = download_stats
                if self._stop.is_set() or self._error.is_set() or self.completed:
                    break
                time.sleep(interval)
        print(f"Time elapsed: {timestring(self.time_spent)}")

yeah I know the code not right, so i put the code in CMD Batch like This:

CaptureAndDownload.py https://example.com/video/Url1/
CaptureAndDownload.py https://example.com/video/Url2/

although you cannot use any python in this period, this perfectly work when i in away

mjishnu commented 7 months ago

After digging some more i found what might be the reason for your download getting stuck, you see the default timeout only produces time out when the server doesn't respond within the timeout. if your connection is slow (like in case when i tested using 2G speeds) the server does respond but the bytes are transferred very slowly, and in order for the speed, remaining time or other value to be updated at least 1MB need to be downloaded if the net is slow this could take a very long time and if your file is large it may make no noticeable change to these values like speed, eta, progress and would remain constant as if stuck. it would not make sense for me to introduce a mechanism into pypdl which would stop the download if the values hasn't change for a period of time say 20sec as there would be people who would want to download file at low speed and are willing to run their machine for a long period of time. what would be a better approach would be for the user to set these limits.

your code looks fine and should achieve the desired effect. and if you want to avoid the drawbacks of not being able to run python during the period and potential data corruption you could try this, this is a example code you may need to modify this according to your needs

from pypdl import Downloader
import time

dl = downloader(timeout=5)
dl start(URL, block=False)

timer = time.time()
prev_size = 0

while not dl.completed:
    # if current size has changed
    if dl.downloaded  > prev_size:
        prev_size = dl.downloaded 
        timer = time.time()

    elif time.time() - timer > 20:
        dl.stop()
        break

    time.sleep(0.5)

This stops the download if the values hasn't changed in the last 20 sec but the actual time taken to stop may be more

as you can see after 20sec the stop was executed evident by time elapsed showing up, but you can see the time elapsed is showing 23 sec rather than 20 this is normal as i intentionally made delay inside of pypdl once stop is executed to ensure proper initialization of shutdown operations. then you can see its still waiting this is because its waiting for the threads to exit this can take some time as in order for it to exit it would need to get 1MB of data from the server then only will it check if stop was triggered. overall it should be less than 60sec in most cases.

As you can see mine took around 50sec in total to exit with 2G speed. if your connection or server is serving data with a slower speed than this then it can take some more time but generally it shouldn't get stuck and would eventually exit even if it takes 5min for really slow connection.

the advantage of this approach is you are not killing the threads while they are working, which can lead to data corruption if the thread was writing data to the disk for instance. this ensure that its being shutdown and resources are being freed properly. though as draw back it can take some more time depending on the connection since it needs to wait for the thread to exit

this is based on my assumption that its a connection issue and not a deadlock. if this is not working and its still getting stuck please do tell as it could be due to deadlock and would require further investigation.

also as a sidenote the code i shared is valid for pypdl v1.2.1, i am in the process of refactoring the code so some attribute name may change in future so if you are using later version please do check the readme file in GitHub or docs in PyPi

kucingkembar commented 7 months ago

sorry for my newbie programming, as far I know the code is processed line by line, after line 1 is done, then do line 2, then line 3, etc,

how the code dl start(URL, block=False) in progress or not finished yet, then you can start while not dl.completed:

mjishnu commented 7 months ago

sorry for my newbie programming, as far I know the code is processed line by line, after line 1 is done, then do line 2, then line 3, etc,

how the code dl start(URL, block=False) in progress or not finished yet, then you can start while not dl.completed:

its alright mate no one is perfect, are you asking how will the while loop execute when the dl.start is still running right ? this is because when you set block=False pypdl doesn't block the execution of rest of the code, this is achieved using multi-threading the dl.start is executed in a child thread rather than the main thread so you can do other tasks while its downloading. but be sure that the main thread doesn't end. if it does this can lead to interpreter shutdown and the download crashing

this is assuming you are asking about this

are you asking how will the while loop execute when the dl.start is still running right ?

if this is not what you meant please do clarify

kucingkembar commented 7 months ago

thank you thank you, the code you provided is flawless, and one thing, sorry if i rude in previous reply

mjishnu commented 7 months ago

its alright man, glad to hear that it was helpful closing this as completed.

mjishnu / pypdl

default "timeout" not working #10