libyal / libewf-legacy

Legacy version of libewf
GNU Lesser General Public License v3.0
10 stars 5 forks source link

libewf_handle_get_utf8_hash_value returns 1 on Windows but -1 on Linux #13

Closed tedsmith closed 3 years ago

tedsmith commented 3 years ago

Dear @joachimmetz

I making great progress with the library, and I am sorry to keep coming back posing questions.

Last night I was up into the early hours trying to debug a mysterious issue.

My implementation of the library computes the E01 hash, and that is working perfectly. And on Windows, it also successfully looks up the existing embedded MD5 or SHA1 hash. However, on Linux, the exact same routine returns -1.

Forgive me as I know you're a C developer. I use a modern object orientated version of Pascal called FreePascal (please, no jokes about how old Pascal is). But I hope you will still be able to see what is happening.

Tlibewfhandlegetutf8hashvalue     = function(handle : PLIBEWFHDL;identifier:pansichar;identifier_length:TSIZE;utf8_string:pansichar;utf8_string_length:TSIZE; error:pointer) : integer; cdecl;
 TLibEWF = class(TObject)
  private              
   flibewfhandlegetutf8hashvalue              : Tlibewfhandlegetutf8hashvalue;  

 TLibEWF.Create
//etc etc
 @flibewfhandlegetutf8hashvalue     :=GetProcAddress(fLibHandle,'libewf_handle_get_utf8_hash_value'); 

// Calls libewfhandlegetutf8hashvalue
function TLibEWF.libewf_GetHashValue(identifier:ansistring;var value:ansistring) : integer;
var
  err:pointer;
  p:pansichar;
  l:tsize;
begin
  err:=nil;
  Result:=-1;
  if fLibHandle<>0 then
  begin
  getmem(p,255);
  if LIBEWF_VERSION='V2' then

  Result:=flibewfhandlegetutf8hashvalue(fCurEWFHandle,
                                          pansichar(identifier),
                                          length(identifier),
                                          p,
                                          l,
                                          @err);
  if result=1 then value:=strpas(p);
  FreeMemory(p);
  end;
end;        

// Call libewf_GetHashValue
var
  strCurrentMD5HashVal : string = Default(string); // declaring as ansistring specifically makes no difference here. Same debug data. (https://wiki.freepascal.org/Character_and_string_types#String) because program is compiled with {$H} switch

begin
// Other stuff goes on first and towards then towards the end I lookup the existing hash
CurrMD5HashValResult  := fLibEWFVerificationInstance.libewf_GetHashValue('MD5', strCurrentMD5HashVal);
             if CurrMD5HashValResult = -1 then result.ExistingHash := 'Unable to retrieve value from image'; // Failed to get existing hash
             if CurrMD5HashValResult =  0 then result.ExistingHash := 'Not available';                                 // Hash not available to get
             if CurrMD5HashValResult =  1 then                                                               // Hash found
             begin
               result.ExistingHash := strCurrentMD5HashVal;
// More stuff
end;

So, for an image that I know contains an MD5 hash, on Windows, the above works flawlessly, returning the existing hash. But on Linux, it returns -1. And I cannot for the life of me see why. I compiled the exact same source code on both Windows and Linux as per some of our other talks.

Note, that on Windows, when going through with a debugger, the following values are reported :

fCurEWFHandle : address to handle pansichar(identifier) : MD5 length(identifier) : returns the length of the MD5 hash value p, : the hash string itself l, : =1 err); = nil

On Linux, all looks good and largely as expected, except :

p = pchar($00007FFFEDA33C20) #152'7'#163#237#255#127, (p)^ = 152 #152 and l = 3902020624

So there in seems to be the issue. Do you have any ideas or see something that is obvious?

joachimmetz commented 3 years ago

please, no jokes about how old Pascal is

noworries, used Delphi and Pascal in the past as well.

But on Linux, it returns -1. And I cannot for the life of me see why.

what does the error message tell you?

tedsmith commented 3 years ago

Good thinking! :-)

"(err)^ = Attempt to dereference a generic pointer." In this context, I'm not entirely clear on what that means given no such problem occurs on the Windows end. Forgive me as it is looking like this is more of a language\syntax issue across platforms than your library. Nevertheless, your genius input would 100% be welcomed!

joachimmetz commented 3 years ago

so this does not sound like a libewf internal error message so I expect there to be some issue between the interfacing of Pascal and the library. Which libewf API function do you call? any parameters of that call that might have a different bit size in Pascal then the library expects?

tedsmith commented 3 years ago

libewf_handle_get_utf8_hash_value

Yes I suspect you are right. I will try and do more digging.

tedsmith commented 3 years ago

Using the libewf error reporting code, I managed to get this output :

if result = -1 then
      begin
        SetLength(strError, 512);
        fLibEWFErrorSPrint(err, @strError[1], Length(strError));
        ShowMessage(strError);
      end;                      

libuna_unicode_character_copy_to_utf8: UTF-8 string too small.
libuna_utf8_string_with_index_copy_from_utf8_stream: unable to copy Unicode character to UTF-8.
libfvalue_string_copy_to_utf8_string_with_index: unable to copy UTF-8 stream to UTF-8 string.
libfvalue_value_copy_to_utf8_string_with_index: unable to copy instance to UTF-8 string.
libfvalue_value_copy_to_utf8_string: unable to copy value: 0 to UTF-8 string.
libewf_handle_get_utf8_hash_value: unable to copy hash value to UTF-8 string.

So that gives me a little more to work on.

tedsmith commented 3 years ago

@joachimmetz the reason l is spurious value on Linux was because I wasn't initiliasing it, and because the function call fails, it never gets set correctly (on Linux) whereas on Windows it does get set correctly because the function call succeeds.

So that now just leaves the string for the hash (variable p in my case). Do you think endianness would play a part in this Linux\Windows issue? I was up late again last night trying to see where it might apply, but given it is just a string, I'm not sure that matters. We're not converting an integer or anything. If you don't think it applies either, I'll keep searching. But didn't want to chase my tail looking into that if it would have no bearing.

joachimmetz commented 3 years ago

Do you think endianness would play a part in this Linux\Windows issue?

unlikely,

libuna_unicode_character_copy_to_utf8: UTF-8 string too small.

indicates the target string is too small. You'll need to have a pre-allocated buffer, what does l contain on both platforms?

I was up late again last night trying to see where it might apply, but given it is just a string, I'm not sure that matters.

With MBC strings, such as UTF-16 and UTF-32, byte order (or endianness) matters, with SBC strings like UTF-8 less likely

tedsmith commented 3 years ago

Dear @joachimmetz

I come back in a state of near lunacy and desperation. After the best of a week wrestling with this, I am near to just giving up. I'm just not smart enough to see what is going on. I've read up on pansichar, pwidestring, punicodestring, ansichar, widestring, unicodestring, got totally confused because folks say pansichar and pwidechar are pointless because ansistring and widestring are themsevles pointers anyway. I've tried with every combination of them all, and sometime the function returns 1, but I can't "get to" the value (the buffer holding it seems to contain garbage data). So I am at the point of just adding a compiler directive that disables this call on Linux, but allow it to go ahead on Windows. But that would suck.

I hoped you may still be sufficiently fluid in Delphi from your previous experience that you may be good enough to glance over the below function and tell me how better you would write it given your coding prowess which is obviously far superior to mine. I stress this does work on Windows, but the fact that it does not work on Linux (it compiles but always returns -1) leads me to suspect something might not be quite right in the first place.

So, if you had to write a Delphi function tomorrow (or freepascal if you're able to use that) that called libewf_handle_get_utf8_hash_value, how might you do it? I have pasted my effort below (which was originally based on another developers work, not my own). I know it's a cheeky ask, and not considered the done thing. But this one issue is holding up the release of my utility which I really want to get done and out (and you have a big mention in the manual!).

// The function declaration is as follows : 
Tlibewfhandlegetutf8hashvalue     = function(handle : PLIBEWFHDL;identifier:pansichar;identifier_length:TSIZE;utf8_string:pansichar;utf8_string_length:TSIZE; error:pointer) : integer; cdecl;
...
// The public class linkage
flibewfhandlegetutf8hashvalue              : Tlibewfhandlegetutf8hashvalue;
...
// Then the function call itself
function TLibEWF.libewf_GetHashValue(identifier:ansistring;var value:ansistring) : integer;
var
  err:pointer;
  HashVal:pansichar;
  l:tsize;
  strError: string;
begin
  err:=nil;
  Result:=-1;
  if fLibHandle<>0 then
  begin
  getmem(HashVal,255);
  if LIBEWF_VERSION='V2' then
    Result:=flibewfhandlegetutf8hashvalue(fCurEWFHandle,
                                          pansichar(identifier),
                                          length(identifier),
                                          HashVal,
                                          l,
                                          @err);

  if result=1 then value:=strpas(HashVal);
  FreeMemory(HashVal);

  if result = -1 then
    begin
      SetLength(strError, 512);
      fLibEWFErrorSPrint(err, @strError[1], Length(strError));
      ShowMessage(strError);
    end;
  end;
end;  
joachimmetz commented 3 years ago

So on Linux Unicode (wchar_t) strings are typically UTF-32 and on Windows UTF-16 little-endian. Since UTF-8 is typically a narrow string (char) I opt to treat these strings a byte strings/raw buffers (no encoding) when interfacing with the library and explicitly converting them in Pascal from native string to UTF-8 and back.

For HashVal try treating the same as strError since technically the hash value string only uses a small set of characters.

HashVal:string
SetLength(HashVal, 512);
l := 512
tedsmith commented 3 years ago

So, I wrestle with this for literally a week of late evenings and half of each weekend day, and you solve it in literally 3 minutes! What a talent you are. OMG...well don't I now feel like an idiot?!!! Yes sir...your solution works perfectly on both platforms!! Simple as that hey. Amazing how some wider understanding around the area helps debug. Thanks so much @joachimmetz ...what a star you are