rails / marcel

Find the mime type of files, examining file, filename and declared type
Apache License 2.0
386 stars 67 forks source link

[Help]: Trying to generate tables with the strings as is #73

Closed michelson closed 2 years ago

michelson commented 2 years ago

Hi there, I'm trying to understand how the generate_tables.rb works, after giving it some debugging I did not understand how can generate the file without modifying the strings with escaped characters.

For example: I need that this record from tika: <match value="MM\x00\x2a" type="string" offset="0"/> generates the same string on the @magic table:

like ...b["MM\x00\x2a"]]... but what I get is something like: b["MM\000*"]]

Of course this is not a bug on the marcel side, but I need to port this to another language and I need to keep the strings as is.

It seems that there are some gsub that modifies that, but even commenting those the strings are not properly generated,

could anyone give me some hints on what to do here?

thanks

rafaelfranca commented 2 years ago

The method that generates that string is https://github.com/rails/marcel/blob/main/script/generate_tables.rb#L12-L16. Ruby strings automatically expand escaping so just removing the gsub will not give you the same string. You probably need to undo the escaping Ruby does.

>> "MM\x00\x2a"
=> "MM\u0000*"