simsong / bulk_extractor

This is the development tree. Production downloads are at:
https://github.com/simsong/bulk_extractor/releases
Other
1.11k stars 187 forks source link

Non-printable character in output #173

Closed jgru closed 3 years ago

jgru commented 3 years ago

Dear Simson,

I have another issue or maybe just a question: I noticed, that fr->write(...) produces non-printable characters in the output-files (at least on my system system), like so:

tests/Data//ELF_bin_yes􀀜-0      b4510fe34bfc81e41f6a6c60b2f9af3b        <ELF class="ELFCLASS64"

If you view it in hex, you can see, that 0xf4 0x80 0x80 0x9c is inserted just after the file name of the finding:

# xxd -s0xa2 -l0x20 /tmp/betest/elf.txt

000000a2: 0a74 6573 7473 2f44 6174 612f 2f45 4c46  .tests/Data//ELF
000000b2: 5f62 696e 5f79 6573 f480 809c 2d30 0962  _bin_yes....-0.b

Is there any reason for that or should this be fixed?

Best regards
Jan

simsong commented 3 years ago

That character is international. It is a Unicode user-defined character. It is used to separate the filename from the forensic offset. It is literally the sbuf_t::map_file_delimiter.

Check here: https://github.com/simsong/be13_api/blob/435240c743948627a6c7b04b2dde9e2d323d548a/sbuf.cpp#L250-251

And here: https://github.com/simsong/be13_api/blob/435240c743948627a6c7b04b2dde9e2d323d548a/sbuf.h#L176-180

jgru commented 3 years ago

Thanks for clarification!