Open benjaoming opened 6 years ago
A quick verification...
I have 9855 videos on my drive from the previous content pack.
Videos that are marked as available by videoscan
:
SELECT COUNT(*) FROM (SELECT COUNT(youtube_id) from item WHERE available=1 GROUP by youtube_id)
> 8879
Videos that are marked as unavailable by videoscan
:
SELECT COUNT(*) FROM (SELECT COUNT(youtube_id) from item WHERE available=1 GROUP by youtube_id)
> 450
This suggests that neither the function that summarizes missing videos, nor the one that summarizes how many have been downloaded take uniqueness of the youtube_id
into account.
Hi @mrpau-eugene -- would you be able to look at this? @rtibbles will help out with any questions you might have :)
@benjaoming sure! will take a look into this one and would find a fix for it 👍
@benjaoming seems like it doesn't account the uniqueness of each youtube_id
https://github.com/learningequality/ka-lite/blob/master/kalite/updates/static/js/updates/bundle_modules/update_videos.js#L160
Should this logic be changed given that we now have duplicated youtube_ids?
There is also a problem with staging.learningequality.org coz I can't download videos. I'm always getting a 404.
There is also a problem with staging.learningequality.org coz I can't download videos. I'm always getting a 404.
This server is used because of using kalite.project.settings.dev
, you can reconfigure it to use the real server by changing the values locally, put this in ~/.kalite/settings.py
:
from kalite.project.settings.dev import *
SECURESYNC_PROTOCOL = "http"
CENTRAL_SERVER_HOST = "kalite.%s" % CENTRAL_SERVER_DOMAIN
CENTRAL_SERVER_URL = "%s://%s" % (SECURESYNC_PROTOCOL, CENTRAL_SERVER_HOST)
...and then invoke kalite
without setting --settings
.
@benjaoming I just found out that this doesn't seem to be the issue with the new content packs since I also experience a glitch using the old content packs.
What I did was:
But if you uncheck at least one of the videos, it will have a different value
It seems to be a problem on how the remote_size
of the Math topic is computed in the database. I have a guess that it has something to do when building the content packs.
In the screenshot above, the remote size is just around 2.5GB which is equivalent to the size when we just select the Math topic checkbox
@mrpau-eugene yes, the sizes are wrong, too :) And they have been so for a while, I'm afraid that it has gone under the radar -- I can't find an issue for it.
What worries me here is that the file count is wrong, too -- which AFAIK is a new issue. And very problematic, too, because a user can select to delete videos in Topic A and then they're deleted in Topic B as well.
@benjaoming I think I seem to have found a fix regarding the sizes of the video. The only problem so far is the file count.
If we have this kind of scenario:
Math (12 total_files)
Early Math
... 5 Videos (3 unique videos)
Algebra I
... 3 Videos (1 unique video)
Algebra II
... 4 Videos (2 unique videos)
Should the 12 total_files stay as is, or should it be 6 total files because we only have 6 unique videos?
Regarding video deletion, I have a suggestion..
path
instead of its youtube_id
available = False
This might work, however when a user performs another videoscan
I think videos using those youtube_ids will be marked as available = True
again so I dont think this fixes the problem at all (?)
Do you have other suggestions?
Should the 12 total_files stay as is, or should it be 6 total files because we only have 6 unique videos?
It should be 6, otherwise the reported number of videos to download/delete will be wrong. This is because the figure is used to show information about video download counts -- not subjects or topic names.
Regarding video deletion, I have a suggestion..
I think your suggestion is a good one!
This might work, however when a user performs another videoscan I think videos using those youtube_ids will be marked as available = True again so I dont think this fixes the problem at all (?)
I can see that it might cause some unwanted scenario: User wants to delete content A (containing video A) such that it becomes hidden in the topic tree. I don't think users are likely to sort things manually like this because: 1) You cannot delete exercises anyways, so there's lots of "read-only" content in the topic-tree and 2) There are so many videos everywhere, users are unlikely to use the Delete function for a very careful manual video curation.
I think we can also see it like this:
User navigates to Video Management in order to Delete videos. The most likely reason for this, is that a specific video is unwanted, either because of disk space or because students aren't supposed to view it. In both cases, the video should be entirely removed, so all occurrences of youtube_id
should have available=False
and the file removed.
This would make things easier, right? :)
It should be 6, otherwise the reported number of videos to download/delete will be completely wrong
I'm still wondering how can this be achieved because we are only using the data from the API which has the total_files for each topic tree. I cant think of any way to fix this as of now.
I think we can also see it like this:
User navigates to Video Management in order to Delete videos. The most likely reason for this, is that a specific video is unwanted, either because of disk space or because students aren't supposed to view it. In both cases, the video should be entirely removed, so all occurrences of youtube_id should have available=False and the file removed.
This would make things easier, right? :)
I think that makes things easier. So are we gonna leave the deletion the way it works right now?
So are we gonna leave the deletion the way it works right now?
I think right now it only marks the contents marked for deletion as available=False
so it needs an additional lookup based on the youtube_id
to mark other occurrences unavailable..?
@benjaoming when I tested awhile ago, after deleting a topic and refreshed the page, other occurences of that youtube_id were also deleted. I think I saw it also in the codebas where deleting a video is based on its youtube_id
@benjaoming
I don't think using a raw SQL fixes the wrong file count..
SELECT COUNT(*) FROM (SELECT COUNT(youtube_id) from item WHERE available=1 GROUP by youtube_id)
Let's say we have this kind of scenario:
We won't have any problems if a user just ticks the Math Total: 50GB (assumption)
This seems to be easy since we just have to do a query like the one you posted.
But when a user only excludes one topic like this:
I don't know what would be the total, but won't we also have a weird file count? I also think this might have a complicated query too.
Though we can, let's say, query the total_files first and then just subtract the unchecked afterwards. But this has some drawbacks since we won't know whether those unchecked topics contains those duplicate youtube videos.
Summary
It seems like the duplication of video nodes is somehow causing a glitch. I can hardly believe that 11,406 videos have been removed and only 394 added in the video set.
System information
Traceback or relevant snippet from server.log
n/a