Closed Agagamand closed 5 years ago
Hi @Agagamand, great question. There is no easy way to do it currently other than overriding FileStream
class to provide progress. An example would be:
public class XXHashFileStream: FileStream
{
private IProgress<double> progress;
public XXHashFileStream(string filename, IProgress<double> progress): base(filename, FileMode.Open, FileAccess.Read)
{
this.progress = progress;
}
public override int Read(byte[] buffer, int offset, int count)
{
int result = base.Read(buffer, offset, count);
progress?.Report(result);
return result;
}
}
private ulong GetXXHash(string filename, IProgress<double> progress = null)
{
ulong hash;
using (var entryStream = new XXHashFileStream(filename, progress))
{
hash = XXHash.Hash64(entryStream);
}
return hash;
}
Note that this will use the XXHash's own buffer size for progress reporting, which is far less than a MB so it may not be ideal for your use case.
I haven't considered update-based APIs because I specifically targeted performant in-memory operations but that might be an important use case. Let me think about it.
And please let me know if the workaround is good enough for you.
I need XXHash for very large files. Your code does not fit, memory overflow error.
I started using this: https://github.com/differentrain/YYProject.XXHash/blob/master/XXHash/YYProject.XXHash/XXHash.cs
private string GetYYP_XXHash(string filename, IProgress<double> progress = null)
{
byte[] buffer;
byte[] oldBuffer;
int bytesRead;
int oldBytesRead;
long size;
long totalBytesRead = 0;
using (Stream stream = File.OpenRead(filename))
using (YYProject.XXHash.XXHash64 hashAlgorithm = YYProject.XXHash.XXHash64.Create())
{
size = stream.Length;
buffer = new byte[1048576]; // 1MB buffer
bytesRead = stream.Read(buffer, 0, buffer.Length);
totalBytesRead += bytesRead;
do
{
oldBytesRead = bytesRead;
oldBuffer = buffer;
buffer = new byte[1048576];
bytesRead = stream.Read(buffer, 0, buffer.Length);
totalBytesRead += bytesRead;
if (bytesRead == 0)
{
hashAlgorithm.TransformFinalBlock(oldBuffer, 0, oldBytesRead);
}
else
{
hashAlgorithm.TransformBlock(oldBuffer, 0, oldBytesRead, oldBuffer, 0);
}
progress?.Report(totalBytesRead);
} while (bytesRead != 0);
StringBuilder sBuilder = new StringBuilder();
for (int i = 0; i < hashAlgorithm.Hash.Length; i++)
{
sBuilder.Append(hashAlgorithm.Hash[i].ToString("x2"));
}
return sBuilder.ToString();
}
}
Progress is not displayed perfectly, but is suitable for very large files
But I am not happy with the speed of YYProject.XXHash. Probably have to go to the native console application.
Did you try my workaround with a custom stream?
I copy&paste your code and try get hash 1 GB file. Summary: System.OutOfMemoryException in progress?.Report(result);
Ok let me try to reproduce this locally tonight. I'm opening the issue to keep track of this. Thanks!
I wrote a simple console app using the workaround I mentioned. It works fine without any errors:
using System;
using System.IO;
using HashDepot;
namespace xxhasher
{
class XXHashFileStream: FileStream
{
private IProgress<double> progress;
public XXHashFileStream(string fileName, IProgress<double> progress)
: base(fileName, FileMode.Open, FileAccess.Read)
{
this.progress = progress;
}
public override int Read(byte[] array, int offset, int count)
{
int readBytes = base.Read(array, offset, count);
progress?.Report(this.Position);
return readBytes;
}
}
class MyProgressReporter : IProgress<double>
{
public void Report(double value)
{
if (value % 1000000 == 0)
{
Console.Write($"Progress: {value}\r");
}
}
}
class Program
{
static void Main(string[] args)
{
if (args.Length != 1)
{
Console.WriteLine("Usage: xxhasher <filename>");
Environment.Exit(1);
}
string fileName = args[0];
using (var stream = new XXHashFileStream(fileName, new MyProgressReporter()))
{
ulong hash = XXHash.Hash64(stream);
Console.WriteLine();
Console.WriteLine($"Hash value: {hash:X}");
}
}
}
}
Here is the output for a 2gb file created with fsutil file createnew dummy 2000000000
:
Progress: 2000000000
Hash value: 350FCE3512A2E8A9
And process memory usage stays steady at 9MB. Nowhere close to an out of memory situation.
This code works. BUT your implementation is slower than YYProject.XXHash. 1GB of hash is calculated one second longer.
Ok, I'm closing as there is apparently no bug causing "out of memory" error. I'll also consider an update-based API to handle these scenarios with high perf if there is more demand to it.
I need to calculate checksums of large files with a program bar. Is the following code correct for this purpose?