Closed gcask closed 6 months ago
please give repository adress/coords so I can test and debug on my side
Sure, I'll DM you the details.
Is there an email or other I can reach you at to share the repo?
I checked again this part of code, and once checksum fail, the downloaded dl_part is deleted before the error dialog is displayed. So the dl_part file you check xxhsum with does not correspond to the file that have checksum failure, it is another file.
As you can see in your screenshot, the error dialog box show the following package identity:
476_vFG_-_Air_Weapons_Range_Objects_(14NOV2019)_v2019.1
while, according repository reference and dl_part file name, it should be this:
476_vFG_-_Air_Weapons_Range_Objects_(14NOV2019)_v2019.11.14
Something goese wrong somewhere in your packages file names... please check again and tell me what you find.
Here's a quick video of the problem (sorry for the quality, hopefully it's still legible enough):
https://github.com/sedenion/OpenModMan/assets/53709079/5001f68a-98f3-46e1-af32-905f6f586d44
the dl_part is still there. If I click ok, it will stay there with the red cross.
Second video is made after the rest of the mods downloaded, I didn't touch anything in between (as shown by the dates), and it does fix itself.
https://github.com/sedenion/OpenModMan/assets/53709079/b0444ed7-878a-4305-8839-5f44242ba9ee
The file should have been deleted before the error dialog appear, this mean the file delete also failed, which finally mean that there is a read/write access problem and may explain why the checksum fail (due to read error).
Is the Library directory in a specific folder with special access right ? Network folder or something like this ?
Regular folder, as you can see.
It's under Saved Games, has no special access to it, and is neither synced nor backed up.
Ok, I guess this is a concurrency problem with another thread that occur in your case because you have very performant CPU (or very slow hard drive) that greatlely outperform file IO bus... it seem that it need to wait some miliseconds until some operations ends. I will add mechanism that wait for proper read/write access.
In that case, download limits were set to max concurrent thread = 1, max rate = 32767KB/s. I get the issue with no limits as well.
My CPU is a AMD Ryzen 7 4800H (8 cores/16 threads @ 2.9GHz base clock), and running off of an NVMe SSD.
I have checked a local build with your fix, and it seems to have done the trick.
Between this and the download limits, I'm finally able to download my entire mod collection without any hiccups ✌️
Looking forward to checking that in an upcoming release, merci beaucoup!
Addendum to my previous message: as soon as I have 2 or more concurrent downloads, the issue comes right back.
After some investigations, it looks like the OmDirNotify
interferes with the finalizeDownload
.
Running debug, I'm seeing both OmNetPack::finalizeDownload()
and OmDirNotify::_available_run_fn()
trying to get RW access on the file at the same time.
I believe there is also a second issue with the __read_buf
being a global static (and therefore shared): when two threads run finalizeDownload
at the same time, they will fight for the buffer and interfere with each other.
Should probably be thread_local
if needed to be kept global (I can confirm doing so does fix the issue).
Concurency with DirNotify thread was what I suspected, but since it close the file handle immediatly it should not block the file for long. Try to change the finalizeDownload wait routine Sleep() value from 50 to 25 or 75 to see if this solve the issue.
The global read buffer is indeed a problem, I will replace it by dynamically allocated buffer.
The interference from the DirNotify is a problem for all the other checks (isZip, getSize, etc) which may spuriously fail to get a file handle if they do so at the same time as the DirNotify.
If the handle that is grabbed at the beginning of finalizeDownload was used throughout, it would indeed no longer be an issue.
True, this imply to write some new functions that take file handle instead of file path...
You can try with the last commit, I think now this should properly work
Well it certainly moved the needle... Errors I get across various runs:
OmModChan::_download_start_fn
: I have two threads hitting it with the same self
ModChan, and things are not happy.OmUiManMainNet::_download_begin_fn
- the NetPack
is a goner.If you need a reliable, proven, lockfree queue, I would suggest looking into cameron314/concurrentqueue.
At this stage you have too many errors, it appear some download threads are launched twices...
well, please retry with last commits. I securred queued download starting and rewritten URL escape function to be thread-safe. It appear that on your system everything happen in the same time... even download ending !
Looking good! No problems across multiple attempts, and now they actually all download 😱
Thank you again for the quick turnaround!
Ok, I think the problem is solved, code is now robust. Thanks for your help.
When I'm downloading several packages in a row, each being several hundreds megabytes, the checksum check will fail for some despite the .dl_part having the expected one:
Check with xxhsum:
mod entry in my repository OMX:
Here the dl_part has the expected checksum, yet OMM thinks it doesn't...