kikito / md5.lua

MD5 sum in pure Lua, with no C and no external dependencies
MIT License
326 stars 151 forks source link

Incorrect md5 value for strings greater than 823 bytes? #4

Closed kikito closed 8 years ago

kikito commented 9 years ago

@lxss has reported here that md5 is returning the wrong md5 value for long enough strings.

require "md5"
local md5 = require 'md5'

local str = "";
for i=1,823 do
  str = str .. "1";
end
print(md5.sumhexa(str)); --823 correct
str = str .. "1";
print(md5.sumhexa(str)); --824 error

@lxss: How are you calculating the "correct" value? What values are you getting with md5.lua?

lxss commented 9 years ago

823: lua = 72bc510bad1c67c3185deb3861c9ab1d php = 72bc510bad1c67c3185deb3861c9ab1d 824: lua = 50937e9b88d5c64d3fc519c2717d3cd0 php = a126fd3611ab8d9b7e8a3384e2fa78a0

lxss commented 9 years ago

"md5.lua 1.0.0"

kikito commented 9 years ago

Hi lxss, thanks for reporting. I have tested this with both LuaJIT and Lua. I got different results!

$ luajit test.lua
72bc510bad1c67c3185deb3861c9ab1d
a126fd3611ab8d9b7e8a3384e2fa78a0
$ lua test.lua                                                                                                                              
72bc510bad1c67c3185deb3861c9ab1d
50937e9b88d5c64d3fc519c2717d3cd0

It seems that there is an error somewhere in the "pure lua" implementation of binary functions. I will have to give this a look, but I don't have time now.

In the meantime, if you can, try using LuaJIT instead of pure Lua.

lxss commented 9 years ago

I wrote lua project is based on the third-party platform, so I can only use lua libraries I went to try to search the bit lua

lxss commented 9 years ago

I failed, I use bit32 lua libraries 824 calculation is correct, the test result error 2048, a lot of 0. AM 3:00 I went to bed. Thank you for your lua I get up in the continue to study

kikito commented 9 years ago

I have made some advancements; the bit libraries used in lua 5.2 and luajit throw different results.

At some point in the calculation 817 << 22 needs to be computed. This throws different results in Lua and LuaJIT:

$ lua -v
Lua 5.2.3  Copyright (C) 1994-2013 Lua.org, PUC-Rio
$ lua -e "print(require('bit32').lshift(817, 22))"
3426746368
$ luajit -v
LuaJIT 2.0.3 -- Copyright (C) 2005-2014 Mike Pall. http://luajit.org/
$ luajit -e "print(require('bit').lshift(817, 22))"
-868220928

It seems that internally Lua 5.2 represents numbers as unsigned ints, while LuaJIT uses signed ints. Hence the different results. I have no idea about how to make them return the same. I will investigate.

lxss commented 9 years ago

Ok, I will also go to try to solve. I use you write sha1.lua temporary solution file checksum.

lxss commented 9 years ago

Lua 5.2 default is 64-bit If it is a 32-bit, the result is correct. No search how to convert 32-bit

pablomayobre commented 9 years ago

Is the problem in bit32 or bit?

If it is in bit this fixes it (returns the same value as Lua bit32):

Replace this:

local ok, bit = pcall(require, 'bit')
if not ok then ok, bit = pcall(require, 'bit32') end

With this:

local ok, bit = pcall(require, 'bit')
if ok then
  local oldbit, bit = bit, {}
  local number = function(n)
    return tonumber('0x'..oldbit.tohex(n),16)
  end

  bit.bor = function(...) return number(oldbit.bor(...)) end
  bit.band = function(...) return number(oldbit.band(...)) end
  bit.bnot = function(...) return number(oldbit.bnot(...)) end
  bit.bxor = function(...) return number(oldbit.bxor(...)) end
  bit.rshift = function(...) return number(oldbit.rshift(...)) end
  bit.lshift = function(...) return number(oldbit.lshift(...)) end
else
  ok, bit = pcall(require, 'bit32')
end

If the problem is in the other two (bit32 and pure Lua) Then you'll need this function:

normalize = bit.tobit or function (n)
  if n > 0x7fffffff then
    return - (bit_not(n) + 1)
  else
    return n
end

You should normalize all the bitops outputs. So in order to use this function you should check that the library loaded is not LuaJIT BitOps but bit32 or pure Lua

lxss commented 9 years ago

OK,I go to try

kikito commented 9 years ago

Hey @lxss,

I have included (a version of) the change proposed by @Positive07 and the 1111... test seems to pass just fine now in 5.1, 5.2 and LuaJIT. Apparently it was only failing on Lua 5.2, because of its non-standard representation of bits using unsigned items.

The fix is on master, and I have also released a new version to luarocks.

Let me know if the new version fixes your problem, so I can close this issue.

pablomayobre commented 9 years ago

I'm glad that hack worked haha!

pablomayobre commented 9 years ago

@Kikito You are using normalize for bit32 but also for bit (that doesn't need it) which might come as an slow down which may be negligible but should be taken into account.

Doesnt the pure Lua implementation have the same problem as the bit32?

kikito commented 9 years ago

@Positive07 Nice catch!

The only Lua which needed "fixing" was Lua 5.2; LuaJIT and Lua 5.1 both work with unsigned (if you give a look at the 5.1 code it contains things similar to your fix - it does if n < 0 then <stuff>.

I have included an extra if now. LuaJIT simply assigns the functions, while Lua 5.2 normalizes them.

pablomayobre commented 9 years ago

Oh I see now! Nice!

jayceefun commented 9 years ago

Hello, i got a wrong value too. when i do md5 to a large string (string length: 188731). It's all zero value.

kikito commented 9 years ago

Hi @jayceefun, can you paste the value somewhere (like in a gist)? If I can't reproduce it, I can't fix it.

jayceefun commented 9 years ago

I can paste it here. It's a lua file actually, and i just wanna encrypt the whole file to a md5 value. But i got this value: "00000000000000000000000000000000"

kikito commented 9 years ago

@jayceefun I have removed your file because it was too long (made scrolling through the issue problematic) and it is possible that github issues add/remove spaces and the like (so the md5 will be different). Please upload it to a gist or some other place which does not reformat text and put the link here.

Also, It would help a lot to know which version of Lua you are using and how exactly you are calculating the md5 of the file in question (the fact that you get all 0s suggests that there might be some other problem somewhere).

jayceefun commented 9 years ago

@kikito The lua version is 5.1 and I'll check my code again. Thanks for your help :)

jayceefun commented 9 years ago

@kikito local f = io.input(filePath, "r") local fileData = io.read("*all") local md5Str = md5.sumhexa(fileData)

This is my simple code.. and when the file is not too large it shows correct value of MD5.

jayceefun commented 9 years ago

I've tried, if the string length more than 63543 , return "00000000000000000000000000000000"

kikito commented 9 years ago

@jayceefun This is probably not it, but just in case: try opening the files in binary mode:

local f = io.input(filePath, "rb") -- notice the "rb" instead of just "r"
jayceefun commented 9 years ago

@kikito it seems that lua stack has a limit size, i'm not sure. Maybe I should cut the file in pieces to calculate, i tried "rb", still got all 0.

pablomayobre commented 9 years ago

@kikito I'm thinking the issue is with:

  1. A string being too long
  2. A number becoming too big and being turned into math.huge or Inf
  3. A table getting too many elements and Lua not handling them right

So in any case an overflow

stejacob commented 8 years ago

Hi,

Are you planning to add an update() function in the md5 library instead? That would alleviate the issue with very long strings. We could just do a for loop and pass in the chunks of the string.

Regards.

kikito commented 8 years ago

Hi, can you elaborate? I don't understand what you mean On Nov 19, 2015 2:00 PM, "stejacob" notifications@github.com wrote:

Hi,

Are you planning to add an update() function in the md5 library instead? That would alleviate the issue with very long strings. We could just to a for loop and pass in the chunks of the string.

Regards.

— Reply to this email directly or view it on GitHub https://github.com/kikito/md5.lua/issues/4#issuecomment-158051202.

stejacob commented 8 years ago

Basically, when we need to generate an MD5 from a very long string, we would split the long string into smaller chunks and pass each chunk to the MD5 in an iterative approach. Therefore, at each iteration, we would update the Hash value by looking at the current checksum state.

For example (something like that): //========================= For each block of string Do md5.update(block of string) End local md5Hex = md5.generateHex() //=========================

Source: http://stackoverflow.com/questions/2214259/combining-md5-hash-values

Hope that explains better what I mean. By the way, very good job on the MD5 library. Thank you very much.

tst2005 commented 8 years ago

FYI, I found alternative implementation https://github.com/tst2005/lua-lockbox/blob/master/lockbox/digest/md5.lua

shixiaomao commented 8 years ago

Someone solved this problem? I want know how to solve it.

ghost commented 8 years ago

This patch fixes it for me. Not sure how it will work in a 64 bit machine.

--- md5org.lua  2016-07-01 09:03:17.000000000 +0200
+++ md5.lua 2016-07-01 08:56:52.000000000 +0200
@@ -346,7 +346,7 @@
   c=z(i,c,d,a,b,X[ 2],15,t[63])
   b=z(i,b,c,d,a,X[ 9],21,t[64])

-  return A+a,B+b,C+c,D+d
+  return bit_and(A+a,-1),bit_and(B+b,-1),bit_and(C+c,-1),bit_and(D+d,-1)
 end

 ----------------------------------------------------------------

The problem appears to be that the results of the additions are often out of the 32-bit range. I haven't checked how the out-of-range results propagate to produce erroneous results. It appears that at least two several turns are necessary.

ghost commented 8 years ago

After a bit more research, I've found that the cause is that the values grow and grow until the precision of the mantissa is not enough to hold the complete integer values. Printing a,b,c,d on each iteration sufficed to show this. The code breaks when reaching 253, because past that number, there are integers that are not representable in double precision.

Based on that information, a more robust patch would be:

--- md5org.lua  2016-07-01 09:03:17.000000000 +0200
+++ md5.lua 2016-07-01 17:05:30.000000000 +0200
@@ -346,7 +346,7 @@
   c=z(i,c,d,a,b,X[ 2],15,t[63])
   b=z(i,b,c,d,a,X[ 9],21,t[64])

-  return A+a,B+b,C+c,D+d
+  return (A+a)%0x100000000,(B+b)%0x100000000,(C+c)%0x100000000,(D+d)%0x100000000
 end
kikito commented 8 years ago

@pgimeno thanks for taking the time to investigate this. Would you mind sending a pull request? That way you will appear on the contributors list

Also, if you can, please include a test in the PR which fails with the current implementation but works with yours; probably doing something similar to this line.