openzim / youtube

Create a ZIM file from a Youtube channel/username/playlist
GNU General Public License v3.0
52 stars 29 forks source link

Md5 sha256 #56

Closed kelson42 closed 4 years ago

kelson42 commented 4 years ago

Unfortunately I was not really able to test this as there is no documentation (anymore) to launch youtube2zim locally. Do you have maybe a "standard" CONTRIBUTING.md with all the explanation (venv, etc...) to put here (something similar to https://github.com/openzim/ted/blob/master/README.md#building-the-project)?

rgaudin commented 4 years ago

The documentation you are looking for is in the README Screen Shot 2019-11-27 at 07 24 36

Regarding the PR, it works but there is a reason we are using md5 and not sha256. This hash is here to represent the list of playlists ids (which can be hundreds of chars long) in both the ZIM Name and filename.

md5 hashes are 32 chars long while sha256 ones are 64 chars long. Here's the difference:

md5: youtube-cb79f5688be3d953bd8494f3b42adb16_fr_all_2019-11.zim
sha: youtube-69f6581ae72f42e5eaecfa72c2bdb020d127ee483c950efe65e30c2f8b32f36b_fr_all_2019-11.zim

So the question is: do you prefer to have longer, less practical ids just to please an automated tool that assumes md5 is always used to hash passwords? 😉

kelson42 commented 4 years ago

@rgaudin I don't get it, the zim file name should come from the channel/user/playlist name, not be a hash... and I never have seen this kind of "hashy" filename!

rgaudin commented 4 years ago

@rgaudin I don't get it, the zim file name should come from the channel/user/playlist name, not be a hash... and I never have seen this kind of "hashy" filename!

That's the case for channel/user/playlist… but not for playlists (several).

I'm not sure we have this kind of input for our youtube ZIMs at the moment (audiobooks would be but have been split into different ZIMs I believe) and of course, this is for generated names. On the Zimfarm, we supply custom names

kelson42 commented 4 years ago

@rgaudin Does my last commit makes sense?

rgaudin commented 4 years ago

@rgaudin Does my last commit makes sense?

The point of this hash is to ensure we generate unique Name meta for the ZIM. Using only the first 7 chars of a 64 chars hash seems collision prone, no?

kelson42 commented 4 years ago

The goal is to get the filename has shorten. Not sure I put the truncation at the right place. For the rest the collision problem seems more theoritical than pratical.

rgaudin commented 4 years ago

What is it exactly that we are trying to solve here?

I'd prefer we make the --name parameter mandatory instead. Then length and uniqueness' responsibility is on user's side. Auto-generated name was a fallback and that's why it's not pretty.

If we consider the fallback is not wanted, then let's require a --name. what do you think?

stale[bot] commented 4 years ago

This pull request has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.