Unsupported UTF-16 surrogate in Lx01

glasen commented 1 year ago

I'm trying to open a Lx01-container which we got delivered by a company. The container seems okay because Encase and Xways can open and extract all files within but libewf fails even when trying to acquire some infos with "ewfinfo". Here is the output of the error message:

ewfinfo 20221124

Unable to open EWF file(s).
libuna_unicode_character_copy_from_utf16_stream: unsupported low surrogate UTF-16 character.
libuna_utf8_string_size_from_utf16_stream: unable to copy Unicode character from UTF-16 stream.
libewf_single_files_read_data: unable to determine UTF-8 string size.
libewf_internal_handle_open_read_segment_file_section_data: unable to parse single files.
libewf_internal_handle_open_read_segment_files: unable to read section data from segment file: 0.
libewf_internal_handle_open_file_io_pool: unable to read segment files.
libewf_handle_open: unable to open handle using a file IO pool.
info_handle_open_input: unable to open file(s).

I'm using the current git-version of libewf.

joachimmetz commented 1 year ago

Can you share a test file or verbose and debug output (see: https://github.com/libyal/libewf/wiki/Troubleshooting#verbose-and-debug-output) ?

libuna_unicode_character_copy_from_utf16_stream: unsupported low surrogate UTF-16 character.

hints at the format not using proper UTF-16

glasen commented 1 year ago

Can you share a test file or verbose and debug output

There is a bit of a problem because the container contains real data (names etc.) from a case. Because of this i cannot share any of this containers.

hints at the format not using proper UTF-16 But why can other programs (including the 7z-plugin "Forensics7z) open and extract files from these containers (We have multiple of them) without problems?

Yesterday evening I started to fiddle around with the code of "libuna" and after patching out the line which throws the error:

libuna_unicode_character.c (line 4093):

/* Determine if the UTF-16 character is within the low surrogate range
 */
/*if( ( utf16_surrogate < LIBUNA_UNICODE_SURROGATE_LOW_RANGE_START )
 || ( utf16_surrogate > LIBUNA_UNICODE_SURROGATE_LOW_RANGE_END ) )
{
    libcerror_error_set(
     error,
     LIBCERROR_ERROR_DOMAIN_RUNTIME,
     LIBCERROR_RUNTIME_ERROR_UNSUPPORTED_VALUE,
     "%s: unsupported low surrogate UTF-16 character.",
     function );

    return( -1 );
}*/

if( utf16_surrogate < LIBUNA_UNICODE_SURROGATE_LOW_RANGE_START ) {
    utf16_surrogate = LIBUNA_UNICODE_SURROGATE_LOW_RANGE_START
}

if( utf16_surrogate > LIBUNA_UNICODE_SURROGATE_LOW_RANGE_END ) {
    utf16_surrogate = LIBUNA_UNICODE_SURROGATE_LOW_RANGE_END
}

With this small crude hack all containers producing this type of error can be processed perfectly.

joachimmetz commented 1 year ago

So this is propriety format, it would be good to understand why the UTF-16 surrogates are abused and for what purpose.

Because of this i cannot share any of this containers.

Can you provide at least a sanitized example so we can try to reproduce the edge case

But why can other programs (including the 7z-plugin "Forensics7z) open and extract files from these containers (We have multiple of them) without problems?

they might be too error tolerant and ignore the fact that this is not proper UTF-16 (ignore some of the basic forensic principles), have a read of https://osdfir.blogspot.com/2020/09/testing-digital-forensic-data.html for context

glasen commented 1 year ago

Can you provide at least a sanitized example so we can try to reproduce the edge case Maybe. Is it possible to change the content of the file (It is only 16kb in size) with a hex-editor without destroying it?

This specific Lx01-container only contains a log-file which can be seen in clear-text. The other thing is a name in the case-section which also must be changed.

Here is the sanitized output of ewfinfo for this container:

Acquiry information:
    Case number:        2019266
    Description:        2019266_0001217-3_0000_Group_Share
    Examiner name:      XXXXX
    Evidence number:    2019266_0001217-3
    Notes:          Group_Share; 06.09.2019
    System date:        Mon Sep  7 09:52:00 2020
    Operating system used:  Windows 2012 Server R2
    Software version used:  8.09
    Password:       N/A

EWF information:
    File format:        Logical Evidence File (LEF) EnCase 7
    Sectors per chunk:  64
    Error granularity:  64
    Compression method: deflate
    Compression level:  no compression
    Set identifier:     XXXXXXX-0f1c-c154-b753-3a8e30464c71

Media information:
    Media type:     single files
    Is physical:        no
    Bytes per sector:   4096
    Number of sectors:  2
    Media size:     5.5 KiB (5620 bytes)

I have at least 50 containers which have multiple UTF-16 related issues (e.g. "unsupported UTF-16 character"). All containers were produced with the same version of Encase.

joachimmetz commented 1 year ago

I have at least 50 containers which have multiple UTF-16 related issues (e.g. "unsupported UTF-16 character"). All containers were produced with the same version of Encase.

Can you be more specific. Which version of EnCase? (I assume 8.09 but would be good to be clear about this) This could be a recent format change, as I said Lx01 is a proprietary format.

Here is the sanitized output of ewfinfo for this container:

I was referring to the UTF-16 string that contain the surrogate characters per "can you share a test file or verbose and debug output (see: https://github.com/libyal/libewf/wiki/Troubleshooting#verbose-and-debug-output) ?" Can you share an example of the data that is current flagged as unsupported UTF-16

joachimmetz commented 1 year ago

The other thing is a name in the case-section which also must be changed.

can you be more specific, it is unclear to me what you are referring to

glasen commented 1 year ago

You can close this issue for now because this problem is probably a "WONT FIX"-bug for you:

Last weekend, i played a bit with the code of "libuna". I've deactivated all "exceptions" in the code which were thrown when trying to open the various containers. I also increased the "MEMORY_MAXIMUM_ALLOCATION_SIZE" to 1GB in libewf because some of the Lx01-containers needed more than the predefined 128 MB. All of them can now be processed (Mounted and extracted) by libewf.

joachimmetz commented 1 year ago

You can close this issue for now because this problem is probably a "WONT FIX"-bug for you:

The conclusion of this issue from my perspective: the reporter REFUSES to provide the necessary information to fix the issue. Instead they waste both their and my time in looking into work-arounds without understanding the underlying problem.

libyal / libewf

Unsupported UTF-16 surrogate in Lx01 #176