ungoogled-software / ungoogled-chromium

Google Chromium, sans integration with Google
BSD 3-Clause "New" or "Revised" License
20.8k stars 844 forks source link

Improvements in hash checking in downloads.py #3045

Closed lifesminder closed 1 month ago

lifesminder commented 1 month ago

Previous iteration checked whole file and caused memory issues, making programs non-responding. This one improves it, by reading chunks by bytes (default is 8 megabytes).

Tested in Fedora 40.

Ahrotahn commented 1 month ago

This is a good idea considering how large the archives are getting, but there's a couple changes that would need to be made:

lifesminder commented 1 month ago

This is a good idea considering how large the archives are getting, but there's a couple changes that would need to be made:

  • There is trailing whitespace on the three empty lines you've added. That is why one of the tests has failed and that would need to be removed.
  • The buffer size should be increased. The common knowledge of using 4k/8k chunks is somewhat misapplied in situations where the file is always going to be large. Python added the helper function file_digest which isn't available to us with the version we use, but they have chosen a size of 218 (262144) which I think we should use instead and will make that take less time to process: https://github.com/python/cpython/blob/1f4a49ea53516e7ff177beedc09a1e4439b3be1f/Lib/hashlib.py#L195

cool, for these purposes I have added an extra chunk_bytes argument, in order to make it scalable and set required size in specific scenarios.

about whitespace I can't say much, as I don't have an experience on python 2.x versions programming :)

lifesminder commented 1 month ago

@Ahrotahn I can't fix the code_check testing because whatever I do it doesn't change file but send an empty commit instead

what should I do?

Ahrotahn commented 1 month ago

There's a different problem this time: There needs to be two line breaks before unpack_downloads. @rany2's suggestion should fix that.

The problem you had previously wasn't with the empty lines, it was that those lines weren't empty. They contained space characters.

Since your commits are signed by Github it seems you're using their editor, and unfortunately that doesn't have the option to show whitespace characters. If you happen to use a userscript extension there is a script that can be used to add that functionality: https://github.com/Mottie/GitHub-userscripts/wiki/GitHub-code-show-whitespace

rany2 commented 1 month ago

Minor thing: you keep saying the default is 8 megabytes in the code and PR, but it's actually 8192 bytes (8 KiB). Anyway as stated in https://github.com/ungoogled-software/ungoogled-chromium/pull/3045#issuecomment-2385931844, the default is too low and should be increased.

lifesminder commented 1 month ago

Minor thing: you keep saying the default is 8 megabytes in the code and PR, but it's actually 8192 bytes (8 KiB). Anyway as stated in #3045 (comment), the default is too low and should be increased.

oh okay, which size should it be?

lifesminder commented 1 month ago

hi, is code reviewed yet?

rany2 commented 1 month ago

You marked my suggestions as resolved without making any changes

lifesminder commented 1 month ago

You marked my suggestions as resolved without making any changes

so how to put them into commit? It's my first time to make a pull request, sorry for too much questions :3

Ahrotahn commented 1 month ago

There should be a button to add the suggestion as a commit. There has also been a change we recently merged that causes a conflict with these changes, so you would also need to use the resolve conflicts button to handle that.

Alternatively I could take some time later to rebase the changes and add those suggestions for you so long as your branch allows us to push to it.

lifesminder commented 1 month ago

well, made @rany2 's improvements and re-added chunk_bytes along with components. I guess it is the latest change of that function, so decided to sync

PF4Public commented 1 month ago

I think we'll have to squash commits before merging to tidy it up a bit.

Ahrotahn commented 1 month ago

@PF4Public The 'squash and merge' option should handle that for us

PF4Public commented 1 month ago

@PF4Public The 'squash and merge' option should handle that for us

yep, I meant just that

lifesminder commented 1 month ago

Thanks @lifesminder!

no problem :Ճ