learningequality / ka-lite

KA Lite: lightweight web server for serving core Khan Academy content (videos and exercises) without needing internet connectivity
https://learningequality.org/ka-lite/
Other
456 stars 306 forks source link

Video downloads break when connection is unstable #5528

Closed jiangjianshan closed 6 years ago

jiangjianshan commented 6 years ago

Hello,

My system's information is as below:

image

The video can't download anymore.
mrpau-eugene commented 6 years ago

Hi @jiangjianshan can you post the logs here? You can do this by clicking the Show KA-Lite Logs from the system tray.

jiangjianshan commented 6 years ago

Hello @mrpau-eugene , I have found the log file inside the folder $HOME.kalite\logs. In order to upload the log files to here, the file extension have been changed from .log to .txt.

django-2017-10-20.txt django-2017-10-19.txt django-2017-10-18.txt django-2017-10-18.txt

mrpau-eugene commented 6 years ago

@jiangjianshan Great! Thanks a lot for the logs. However, I wasn't able to reproduce this issue..

May I also know what kind of connection are you using? Also, do you have another PC/Laptop which you can try to test it again?

benjaoming commented 6 years ago

There are a lot of unhandled connection errors it seems:

Traceback (most recent call last):
  File "D:\Python27\lib\site-packages\kalite\packages\bundled\fle_utils\videos.py", line 48, in download_video
    response = download_file(thumb_url, thumb_filepath, callback_percent_proxy(callback, start_percent=95, end_percent=100))
  File "D:\Python27\lib\site-packages\kalite\packages\bundled\fle_utils\internet\download.py", line 64, in download_file
    headers={"user-agent": user_agent()}
  File "D:\Python27\lib\site-packages\kalite\packages\dist\requests\api.py", line 70, in get
    return request('get', url, params=params, **kwargs)
  File "D:\Python27\lib\site-packages\kalite\packages\dist\requests\api.py", line 56, in request
    return session.request(method=method, url=url, **kwargs)
  File "D:\Python27\lib\site-packages\kalite\packages\dist\requests\sessions.py", line 475, in request
    resp = self.send(prep, **send_kwargs)
  File "D:\Python27\lib\site-packages\kalite\packages\dist\requests\sessions.py", line 617, in send
    history = [resp for resp in gen] if allow_redirects else []
  File "D:\Python27\lib\site-packages\kalite\packages\dist\requests\sessions.py", line 177, in resolve_redirects
    **adapter_kwargs
  File "D:\Python27\lib\site-packages\kalite\packages\dist\requests\sessions.py", line 596, in send
    r = adapter.send(request, **kwargs)
  File "D:\Python27\lib\site-packages\kalite\packages\dist\requests\adapters.py", line 487, in send
    raise ConnectionError(e, request=request)
jiangjianshan commented 6 years ago

@mrpau-eugene , My laptop was using wireless or LAN at home which it band is 20Mbit/s. Also, I have to change my hosts file in order to access google at China. hosts.zip

@benjaoming , I have meet this issue when install and upgrade to 0.17.2, but even I latterly update to 0.17.3 but still have the same issue. So I have to wait the bug fix on the 0.17.4 release?

mrpau-eugene commented 6 years ago

@jiangjianshan I don't think it's a bug but rather a problem with China's ISP blocking some websites. Are you able to watch YouTube videos without any problems using the hosts file?

I'm not quite sure if we can do anything about this. Maybe @benjaoming has some ideas?

jiangjianshan commented 6 years ago

@mrpau-eugene ,I can't watch YouTube videos even changed the hosts file. But I remember for version 0.17.1, even I can't watch the YouTube video but still can download the video by ka-lite.

mrpau-eugene commented 6 years ago

@jiangjianshan I can't seem to find any significant changes that affects downloading videos on the 0.17.1 based on the Release Notes.

But just to make sure, can you try installing 0.17.1 again to see if you still get the same errors?

You can download the 0.17.1 installer here.

benjaoming commented 6 years ago

I looked through 2 of the log files and all errors are related to download_file(thumb_url, ...), so they are about downloading thumbnails. What is weird is:

  1. Thumbnail downloads happen after downloading videos, but in the same function. So if the video download would fail, it should not proceed to thumbnails. This indicates that video downloads work, but it's the thumbnail part that's broken.
  2. Both videos and thumbnails are hosted at http://s3.amazonaws.com/KA-youtube-converted/

This supports the description by @jiangjianshan :

Recently I have my laptop can download the video via ka-lite. Every the progress bar will be block at the position very close to the end. And then the progress bar disappear.

@mrpau-eugene could you supply us with a URL and a video that @jiangjianshan can try out?

jiangjianshan commented 6 years ago

@mrpau-eugene , maybe you can send the URL link of PXlvKtpvUEk.png, which one I have successfully download it by KA Lite at latest time and I can't search its URL link by www.baidu.com. Currently I can't use google because of the hosts file not work for it now.

mrpau-eugene commented 6 years ago

@jiangjianshan Here is the link for it http://s3.amazonaws.com/KA-youtube-converted/PXlvKtpvUEk.mp4/PXlvKtpvUEk.mp4

Can you try it out and see if you can watch the video?

jiangjianshan commented 6 years ago

@mrpau-eugene , after I copy the link to URL address bar in Chrome, the video can be play. But I think the issue I meet is I can't download the .png file but not .mp4 file via KA Lite. Could you send me the link of PXlvKtpvUEk.png but not PXlvKtpvUEk.mp4?

mrpau-eugene commented 6 years ago

@jiangjianshan sorry about that.. here is the link to the thumbnail http://s3.amazonaws.com/KA-youtube-converted/PXlvKtpvUEk.mp4/PXlvKtpvUEk.png

It seems like the videos you downloaded such as this one: 1tSrRYU6LKM and PZM6acXvsoQ does not have a video thumbnail on s3.

And it seems like it is trying to download the thumbnail from YouTube instead. I think that's the reason why it's displaying errors? Am I right @benjaoming ?

Max retries exceeded with url: /vi/PZM6acXvsoQ/mqdefault.jpg (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x000000000B06BDD8>: Failed to establish a new connection: [Errno 10060] ',))

Try to access https://img.youtube.com/vi/PZM6acXvsoQ/mqdefault.jpg and I think you will have problems while accessing it..

jiangjianshan commented 6 years ago

@mrpau-eugene , I can download PXlvKtpvUEk.png for the link you given. image

But for the one from the URL link https://img.youtube.com/vi/PZM6acXvsoQ/mqdefault.jpg, as you said, I can't download it.

mrpau-eugene commented 6 years ago

@jiangjianshan Thanks for the info! I would just like to confirm if you are able to download these 2 videos:

screen shot 2017-10-26 at 2 11 52 pm

They are both available on Amazon S3. If you are able to download it, then I think the case has been resolved (?) and someone just need to update the server to include the missing videos.

benjaoming commented 6 years ago

Received another report by email..

Am writing a patch now.

jiangjianshan commented 6 years ago

@mrpau-eugene , I have just tried your suggestion but it failed to download those two videos. What I have done is firstly delete these two videos by KA Lite, and then log out and log in again. After I select them again to download. I have found they can't be download. But actually I was download it before at several months ago. image

The log file inside $HOME\.kalite\logs as attachment here.

django-2017-10-26.txt

  Also, the hosts file has been changed to the original one for my laptop.

hosts.zip

jiangjianshan commented 6 years ago

@benjaoming , thanks for your hard working for the patch.

benjaoming commented 6 years ago

@jiangjianshan thanks for your further investigation -- at the moment, we have a pretty good idea that it's because thumb nail URLs have changed. Would you be able to try out a pre-release and confirm if it is fixed?

jiangjianshan commented 6 years ago

@benjaoming , I'm uninstall the current version 0.17.3 and going to install the version 0.17.1. It may take several hours to install the new version and run the command "kalite manage setup" after each installation. I will report the result to you soon if I finish them.

benjaoming commented 6 years ago

Sorry to hear that it takes so long to install on Windows for you. I'm not sure if this is a general issue.

Anyways, I will get it reproduced and fixed -- then I will let you know when a Windows installer with the final fix is released, so you don't have to spend several hours installing to test it :)

Because it will normally take two or three hours to complete the install and do the command 'kalite manage setup'.

@mrpau-richard this is a separate issue, do you have some guidance that you can refer to? Or maybe open a new installers issue that we can refer to with these sporadic reports?

jiangjianshan commented 6 years ago

@benjaoming , it doesn't matter, I run the command "kalite manage setup" and then go to sleep. This morning I try the version 0.17.1, but got the same issue. Here is the log file. django-log-2017-10-27.txt

benjaoming commented 6 years ago

Sorry about the delay -- we have a slight complexity: The video download routes through LE servers, so perhaps we can fix this without a new release, but simply by patching the central server so it works for any release.

Take this example:

http://kalite.learningequality.org/download/videos/9GQdh2eGP-Y.mp4/9GQdh2eGP-Y.mp4

Redirects to:

http://s3.amazonaws.com/KA-youtube-converted/9GQdh2eGP-Y.mp4/9GQdh2eGP-Y.mp4

jiangjianshan commented 6 years ago

@benjaoming , attachment is the current log file of my ka-lite installed on my laptop. django-2017-10-31.txt So I have to wait for you after apply all the patching inside central server and then I can start the ka-lite to download the video?

benjaoming commented 6 years ago

Sorry about the delay in solving this, am now back to the initial plan: We need to create a new release of KA Lite, because we currently don't have a proxy (the assumption in my previous comment doesn't hold).

Example:

~ kalite manage shell
>>> from kalite.topic_tools import content_models
>>> items = content_models.get_content_items()
>>> from fle_utils import videos
>>> for i in items:
...     if not i['youtube_id']:
...         continue
...     videos.get_outside_video_urls(i['youtube_id'])
...     break
...     
... 
(u'http://s3.amazonaws.com/KA-youtube-converted/y2-uaPiyoxc.mp4/y2-uaPiyoxc.mp4', u'http://s3.amazonaws.com/KA-youtube-converted/y2-uaPiyoxc.mp4/y2-uaP
iyoxc.png')
jiangjianshan commented 6 years ago

@benjaoming , thanks for your reply, so I have to wait for the new release 0.17.4 and then the issue should be solved.

benjaoming commented 6 years ago

@jiangjianshan yes, seems so, but I will get back directly as soon as we know when it happens!

benjaoming commented 6 years ago

@jiangjianshan interesting - I ran a script that downloads all thumbnails in the English database, and I had the same connection errors happen 197 times.

We should fix this issue by:

  1. Allowing connection retries https://stackoverflow.com/questions/15431044/can-i-set-max-retries-for-requests-request
  2. Failing gracefully and logging on CRITICAL level if thumbnails are missing during the download

I'll try to fix 1) and check if I can download all the thumbnails in one go. If successful, I will release 0.17.4.

jiangjianshan commented 6 years ago

@benjaoming , thanks for your hard working.

benjaoming commented 6 years ago

This is still not complete! There are more issues in the video download that weren't resolved in #5536 :/

benjaoming commented 6 years ago

Further robustness added in #5545 -- closing this, but in case there are further problems, we can target those in more specific issue reports.

Thanks for reporting @jiangjianshan - this problem in specifically addressed in 0.17.4, which is released on PyPi, and installers will be ready in a few days.