Closed yael333 closed 1 year ago
Hi,
Thanks for the compliment, I hope you'll manage to understand how these macros work, I'll add more comments in the code in the next version to make everything as clear as possible :).
I didn't know about polyglot files, that's very interesting, thanks for sharing!
Currently, file-format
via FileFormat::from_file
will return only one file format: the first one for polyglot files.
On the other hand, with FileFormat::from_reader
or FileFormat::from_bytes
, it should be possible to identify all the formats contained in a polyglot file, if we can determine the beginning of each of them.
Thanks for asking!
In fact, it might be necessary to extend FileFormat::from_file
so as not to return a format (perhaps return an error, or the generic FileFormat::ArbitraryBinaryData
format). Otherwise, the crate could be fooled.
If you think it's possible and useful, we can also imagine a polyglot
feature that activates a FileFormat::from_polyglot_file
method, which would return several file formats.
In any case, I don't think it's easy to delimit sub-files.
If you have any ideas, I'd love to hear them!
Thank you so much for the quick and thorough response~ Detecting and working with polygot files is still quite arcane and esoteric, hence why I started this Rust project~
These files usually have overlapped sections as they're not a regular archive file, meaning if you take the same slice of file and run it through and check for signatures it will pass for multiple formats. While also the definition of these files is vague, and usually depends on the validation of an external parser or program (For example most PDF polygots don't follow the official standard but still get opened well on most PDF readers).
Whether you'd wish to support parsing for these files depends on the scope of your program, but if needed I can contribute as well. I'll update about the success of integrating this awesome module into my project <3
Yes, please keep me posted! Feel free to open a PR, I'll follow your project!
Hi 👋 I love this project, such amazing selection of file formats~ I'm not sure if I didn't manage to comprehend the macro sorcery fully but is there a way to check if file contains multiple types (such as various esoteric file polygots )?
If not, just checking for every singular file individually would be great. thank you so much for the help <3