sedenion / OpenModMan

Open Mod Manager - Open source and generic Mod ("Modifications") manager.
GNU General Public License v3.0
84 stars 5 forks source link

Downloading several mods at a time will trigger a 'checksum failed' despite the downloaded package being correct. #63

Closed gcask closed 6 months ago

gcask commented 6 months ago

When I'm downloading several packages in a row, each being several hundreds megabytes, the checksum check will fail for some despite the .dl_part having the expected one: image

Check with xxhsum:

> .\xxhsum.exe -H2 '.\476_vFG_-_Air_Weapons_Range_Objects_(14NOV2019)_v2019.11.14.ozp.dl_part'
\7adf60c5558ebfcc4f4d09931c11b959  .\\476_vFG_-_Air_Weapons_Range_Objects_(14NOV2019)_v2019.11.14.ozp.dl_part

mod entry in my repository OMX:

<mod ident="476_vFG_-_Air_Weapons_Range_Objects_(14NOV2019)_v2019.11.14" file="476_vFG_-_Air_Weapons_Range_Objects_(14NOV2019)_v2019.11.14.ozp" bytes="33787268" xxhsum="4f4d09931c11b959" category="Generic">
      <description bytes="74">data:application/octet-stream;base64,eJzLKCkpKLbS1y8vL9czMTcrS8tMzyhJLUovyi8t0EvOz9VPyS/Py8lPTCnWK8gosE/Jt4UJqMEYmSm2JhZGDAARMxs2</description>
      <url>...snip...</url>
    </mod>

Here the dl_part has the expected checksum, yet OMM thinks it doesn't...

sedenion commented 6 months ago

please give repository adress/coords so I can test and debug on my side

gcask commented 6 months ago

Sure, I'll DM you the details.

gcask commented 6 months ago

Is there an email or other I can reach you at to share the repo?

sedenion commented 6 months ago

I checked again this part of code, and once checksum fail, the downloaded dl_part is deleted before the error dialog is displayed. So the dl_part file you check xxhsum with does not correspond to the file that have checksum failure, it is another file.

As you can see in your screenshot, the error dialog box show the following package identity:

476_vFG_-_Air_Weapons_Range_Objects_(14NOV2019)_v2019.1

while, according repository reference and dl_part file name, it should be this:

476_vFG_-_Air_Weapons_Range_Objects_(14NOV2019)_v2019.11.14

Something goese wrong somewhere in your packages file names... please check again and tell me what you find.

gcask commented 6 months ago

Here's a quick video of the problem (sorry for the quality, hopefully it's still legible enough):

https://github.com/sedenion/OpenModMan/assets/53709079/5001f68a-98f3-46e1-af32-905f6f586d44

the dl_part is still there. If I click ok, it will stay there with the red cross.

Second video is made after the rest of the mods downloaded, I didn't touch anything in between (as shown by the dates), and it does fix itself.

https://github.com/sedenion/OpenModMan/assets/53709079/b0444ed7-878a-4305-8839-5f44242ba9ee

sedenion commented 6 months ago

The file should have been deleted before the error dialog appear, this mean the file delete also failed, which finally mean that there is a read/write access problem and may explain why the checksum fail (due to read error).

Is the Library directory in a specific folder with special access right ? Network folder or something like this ?

gcask commented 6 months ago

Regular folder, as you can see.

It's under Saved Games, has no special access to it, and is neither synced nor backed up.

sedenion commented 6 months ago

Ok, I guess this is a concurrency problem with another thread that occur in your case because you have very performant CPU (or very slow hard drive) that greatlely outperform file IO bus... it seem that it need to wait some miliseconds until some operations ends. I will add mechanism that wait for proper read/write access.

gcask commented 6 months ago

In that case, download limits were set to max concurrent thread = 1, max rate = 32767KB/s. I get the issue with no limits as well.

My CPU is a AMD Ryzen 7 4800H (8 cores/16 threads @ 2.9GHz base clock), and running off of an NVMe SSD.

gcask commented 6 months ago

I have checked a local build with your fix, and it seems to have done the trick.

Between this and the download limits, I'm finally able to download my entire mod collection without any hiccups ✌️

Looking forward to checking that in an upcoming release, merci beaucoup!

gcask commented 6 months ago

Addendum to my previous message: as soon as I have 2 or more concurrent downloads, the issue comes right back.

gcask commented 6 months ago

After some investigations, it looks like the OmDirNotify interferes with the finalizeDownload. Running debug, I'm seeing both OmNetPack::finalizeDownload() and OmDirNotify::_available_run_fn() trying to get RW access on the file at the same time.

I believe there is also a second issue with the __read_buf being a global static (and therefore shared): when two threads run finalizeDownload at the same time, they will fight for the buffer and interfere with each other.

Should probably be thread_local if needed to be kept global (I can confirm doing so does fix the issue).

sedenion commented 6 months ago

Concurency with DirNotify thread was what I suspected, but since it close the file handle immediatly it should not block the file for long. Try to change the finalizeDownload wait routine Sleep() value from 50 to 25 or 75 to see if this solve the issue.

The global read buffer is indeed a problem, I will replace it by dynamically allocated buffer.

gcask commented 6 months ago

The interference from the DirNotify is a problem for all the other checks (isZip, getSize, etc) which may spuriously fail to get a file handle if they do so at the same time as the DirNotify.

If the handle that is grabbed at the beginning of finalizeDownload was used throughout, it would indeed no longer be an issue.

sedenion commented 6 months ago

True, this imply to write some new functions that take file handle instead of file path...

sedenion commented 6 months ago

You can try with the last commit, I think now this should properly work

gcask commented 6 months ago

Well it certainly moved the needle... Errors I get across various runs:

If you need a reliable, proven, lockfree queue, I would suggest looking into cameron314/concurrentqueue.

sedenion commented 6 months ago

At this stage you have too many errors, it appear some download threads are launched twices...

sedenion commented 6 months ago

well, please retry with last commits. I securred queued download starting and rewritten URL escape function to be thread-safe. It appear that on your system everything happen in the same time... even download ending !

gcask commented 6 months ago

Looking good! No problems across multiple attempts, and now they actually all download 😱

Thank you again for the quick turnaround!

sedenion commented 6 months ago

Ok, I think the problem is solved, code is now robust. Thanks for your help.