iquercorb / OpenModMan

Open Mod Manager - Open source and generic Mod ("Modifications") manager.
GNU General Public License v3.0
88 stars 5 forks source link

Inform user abour checksum mismatch after download from repository #18

Closed retroluxfilm closed 2 years ago

retroluxfilm commented 2 years ago

I had the wrong checksum set for one download and it silently not added it to to the library folder. As if it was not downloaded at all. Checking the debug log has shown it was a valid checksum mismatch.

It would be great to show the enduser that the download was completed (keep download somewhere?) but the checksum did not match the original file of the repository. So its more clear what the issue is and the user can request help from the community that manages the repository.

Working with OMM so far is really good. Still some UX improvements could be done here and there but overall it feels already really solid. Would it be possible to generate the repo xml file out from php with the png data embedded and the checksum that matches the data OMM generates?

iquercorb commented 2 years ago

I added the missing dialogs for package download error. In case of checksum missmatch error the downloaded data is not deleted anymore, but still as temporary file in the library folder.

Creating dynamically PHP generated xml repository is theoretically possible, but the most obvious problem I see here is the checksum algorithm, XXHash, which is, as far as I know, not implemented in standard PHP right now...

Anyway, here is the current repository XML "Specifications", beyond the obvious:

iquercorb commented 2 years ago

I will see for an MD5 implementation in order to provide an optionnal md5 checksum, which would be way more easy to do in PHP...

retroluxfilm commented 2 years ago

The specifications really helps to get a php implementation coded to create the repo xmls. Just checked that xxhash algorithms are supported on the newest php 8.1 https://php.watch/versions/8.1/xxHash. MD5 might be more compatible for older php versions to use. Not sure if you should change your current implementation just for that.

What could work is to fetch the mods from a directory on the webserver and search for zip files that also have a txt and jpg file of the same name like:

modname_v1.0.zip
modname_v1.0.txt
modname_v1.0.jpg

This is all the data the xml generator would need to add a mod to the repository xml. I certainly would make it a bit easier to maintain a repo without the need of having all the mods locally to generate the repo xml locally with OMM.

Thanks sedenion for really bringing OMM forward. Hope that more and more communities & moders will start using OMM to prepare their mods.

iquercorb commented 2 years ago

Well, I finally managed to find an implementation of MD5 to adapt and embed in my program, this was the biggest part and not so hard. So I added support for MD5 algorithm for repository files checksum (this is now the default setting).

To use MD5 instead of the XXHash algorithm you must specify the "md5sum" attribute instead of the "checksum" attribute. The md5sum string must be the hexadecimal representation (in ASCII) of the 128 bits (16 bytes) hash value of the file binary data using the MD5 algorithm.

At this stage, I am not sure what is the standard for byte order. I implemented output and reading of hexadecimal representations the same way the Get-FileHash command (PowerShell) output it. I guess this is the standard way, but I don't know.

retroluxfilm commented 2 years ago

Thats great and makes it easier to implement it in php by using the MD5 route instead of the xxhash which would require PHP 8.1. Will the OMM check first the MD5 is present and if not the checksum? This would keep the old repos in working state.

The PHP implementation I am working on will go through all files in a directory and interating though all zip files. It will check if the zip has a package.omp in the root to be a valid OMM mod package. My first idea would cause unneeded redundant files that would make it even more error prone. And with this approach I can also read out the dependencies that are needed in the repo xml. Through the MD5 hash I can also detect if the file needs to be updated or skipped from updating the repo entry.

Would it be possible to have a per remote entry to allow files in subdirectories? (But I could also use the customURL link you have added already.)

If you like I can submit the php helper as pull request into your repo when its done to help other communties use it for their web repos?

iquercorb commented 2 years ago

For the next release the repository parser will support both old and new method, it will search for "checksum" or "md5sum" attribute and performs check using the proper algorithm. when released I will advise people that old "checksum" attribute is deprecaded and who maintain repositories should recreate them with the new version. Later I will suppress the support for XXHash checksum.

I don't understand your "subdirectories" request/question... But indeed, you can use the Custom URL mechanism.

retroluxfilm commented 2 years ago

Sounds like a good approach to handle it like that.

About the subdirectory question. For example we have a directory structure like this where the mods folder is the global downpath.

mods
mods\tracks
mods\tracks\historic
mods\cars
mods\misc
mods\championships

Now having one repo xml for the community would not make it work with directorys set up like that. Putting all the zips into the mods dir would work but make the structure less organized. So it would be helpfull to override the downpath tag within the remote tag. Example:

    <remote ident="CARS_-_Porsche_911_RSR_1974_v1.12" file="CARS_-_Porsche_911_RSR_1974_v1.12.zip" bytes="91378287" checksum="55fec7715fe477cc">
      <picture></description>
      <downpath>mods/cars</downpath>
    </remote>

It also might be helpfull to be able to add category tags to a mod package to be able to filter them in OMM by specific tags. Sorry for the many feature suggestions. Let me know if you want such suggestions in a seperate issue.

iquercorb commented 2 years ago

I will think about a package category and custom path per repository, both are possible requiere some work and GUI adaptation...

retroluxfilm commented 2 years ago

Released first version of the OMM PHP library in a seperate repo as it got bigger than planned. https://github.com/retroluxfilm/OMM-PHP-Library

There it still work to do as i need to include the recent changes you worked on (categorys, subfolders etc.) to get it compatible for the next release. But so far it works creating a valid repository xml like the OMM does locally.

iquercorb commented 2 years ago

I added category entry for packages and a new enhanced "custom URL" mechanism for repositories download. The "custom download path" is now smarter and build the final URL with completion rules. User can now define either

retroluxfilm commented 2 years ago

Great news! This will make it more flexible and easier to use and still allow urls to work as well. I will implement it for the subdirectory search of the folder repository generation on PHP level. Guess this issue can be closed.

retroluxfilm commented 2 years ago
  • The file checksum must be the hexadecimal representation (in ASCII) of the 64 bits (8 bytes) hash value of the file binary data using the XXhash3 algorithm.

I added the xxh3 checksum to the php library as well due to the speed peformance it can have over md5. On PHP < 8.1 it will fallback to MD5 hash.

But I noticed that the checksum does not match with the one OMM is creating. Did not find a solution yet for them to match. Checking the sourcecode of OMM it looks like both are using the same algoritm (xxh3: 64-bit hash output) but there must me something along the line that is causing the mismatch. (https://php.watch/versions/8.1/xxHash) If you have some a hint what I can try, please let me know as I am going in circles testing out serveral different approaches...

iquercorb commented 2 years ago

Did you check the bytes order ? It is possible that my code output bytes in reverse order compared to what PHP output (I always wonder in which order to ouput bytes)... For example, the PHP output "f406cee14d73b176" may be represented "76b1734de1ce06f4" in OMM.

retroluxfilm commented 2 years ago

Sadly the hashes are completly different of the same file so it seems not to be the byte order. OMM => "a7ad2bb966d7bd0e" PHP => "545572d6aab3fb76"

I also compared the default secret with the one from the OMM hash library and the results in PHP are the same so that is not the reason that they differ.

I hashed the string "Open Mod Manager" for a quick test and that are the results from PHP: PHP default '01eb2b90a9c978b7' PHP OMM secret '01eb2b90a9c978b7' secret reversed '293c6583da5ab5e0'

Guess I need to get OMM configured properly in CLion to test & debug it better.

iquercorb commented 2 years ago

I tried on my side using current implementation and I properly get '01eb2b90a9c978b7' for the 16 bytes length "Open Mod Manager" char string.

I think the problem is how file data and size is handled. Obviously function like file_get_contents() should not be used for this kind of operation. The fopen(), fseek(), ftell() and fread() functions, which are probably wrapers of their C counterpart, should be used. And especially, the fopen() function should be called using "rb" (read binary) mode.

$fp = fopen("my_file", "rb");   // read binary
fseek($fp, 0, SEEK_END);        // go to EOF
$size = ftell($fp);             // get pointer current position
fseek($fp, 0, SEEK_SET);        // go back to start of file
$data = fread($fp, $size);      // read data
fclose($fp);
retroluxfilm commented 2 years ago

Sadly your example did not work out and returned the same hash. I tested a few different approaches to create the hash but the results where all the same in the end. Still dont have a clue why it is not identical to OMM. Created a very small testpackage to quick testing.

TestPackage_v1.zip

OMM reference    '63187c3e64de3484'
PHP default      'f0b37ece98dc597b'
PHP OMM secret   'f0b37ece98dc597b'

I let you know if i have found the issue.

iquercorb commented 2 years ago

I did various test on my side. I checked XXHash PHP implementation source code and refactor my code to fit the PHP way to do, and I always get '63187c3e64de3484' for the given file. I even tryed to find the hash you get by providing some truncated data, either at begining or at end (in the idea that PHP implementation don't read full data, for any reason), but again, not any data portion give the value you got.

All this is very intrigating. At this stage, I realy suspect that either the data you provide to XXHash2 in PHP are corrupted, or the PHP implementation is doing something wrong.

Since it is hard to check binary data integrity (especially in high-level context like PHP) I created a simple text file to be used as control data:

data.txt

The file is exactly 64 bytes length, with only ASCII and its XXHash3 digest is '907a595593412e1b'

retroluxfilm commented 2 years ago

The data.txt gets the same hash on PHP. To narrow it down I tried to test to create the hash with the TestPackage_v1.zip again but this time with the official commandline tool from here https://github.com/Cyan4973/xxHash/releases

xxhsum.exe -H3 TestPackage_v1.zip
XXH3 (TestPackage_v1.zip) = f0b37ece98dc597b

It resulted in the same hash as php is generating. So sadly there might be an issue with the OMM implementation. Possibly the thirdParty library you are using is outdated or the FileRead is truncated somehow. ReadFile can return false when there was an error reading the file.

iquercorb commented 2 years ago

Well I embedded the last XXHash source files and I now get the proper hash, surprisingly that was simple as this. The embedded version until now was the 0.7.4. The two versions probably have different way to treat medium-large data.

retroluxfilm commented 2 years ago

Great that it was easy to sort out finally. Looking forward to the next OMM release as the changelogs looks already very great.