google / open-vcdiff

An encoder/decoder for the VCDIFF (RFC3284) format
Apache License 2.0
186 stars 52 forks source link

"Value too large for defined data type" when dictionary file is > 4GB #16

Open Steelskin opened 9 years ago

Steelskin commented 9 years ago

Original issue 16 created by jeff.gustafson on 2008-10-09T22:13:01.000Z:

What steps will reproduce the problem?

  1. File larger than 4GB 2. 3.

What is the expected output? What do you see instead? normal operation

What version of the product are you using? On what operating system? 0.2

Please provide any additional information below.

Steelskin commented 9 years ago

Old comments:

open-vcdiff currently uses 32-bit integers to represent addresses and offsets. That representation causes the file size limitation you encountered. The solution will be to convert to using 64-bit integers uniformly. I will look into doing so in a future version of open-vcdiff.

Reply:

To expand upon my earlier comment, open-vcdiff was not originally designed with very large input sets in mind, but rather as a tool for implementing the SDCH protocol for moderately-sized HTTP responses. In that context, 2GB was seen as an ample limit.

I would like to make open-vcdiff as useful as possible for as many people as possible. Generalizing it to handle very large input and output files (so that it can be applied, for example, to revision control of huge text files) will be a good step towards that goal.

Side note: the use of std::string objects for internal storage restricts data sizes to std::string::max_size(), independently of whether 32-bit or 64-bit integers are used for addresses and offsets.

Then, more requests for that feature. Increasing priority, this would be handy.