Some PDF are not recognized

neilharvey / FileSignatures

A small library for detecting the type of a file based on header signature (also known as magic number).

MIT License

250 stars 41 forks source link

Some PDF are not recognized #66

Closed MaximeLaffaire closed 2 months ago

MaximeLaffaire commented 4 months ago

Hi 🙂

There is a problem in the pdf format recognition: the code is looking for the 4 bytes header %PDF at the very beginning of a file, but in some cases a pdf file can have data before this header, as described in this documentation page 644 under "File Header".

Also, the header can be in the format %!PS−Adobe−N.n PDF−M.m.

I can send a PR with this changes, what do you think ?

Thanks :)

neilharvey commented 4 months ago

Ah, that's interesting! Sure, feel free to submit a PR if you have something in mind :)

If you have a sample PDF that demonstrates the issue it would be really useful, I usually pop a case in the FunctionalTests.cs. If you don't have a suitable sample, no worries - I trust the spec.

MaximeLaffaire commented 3 months ago

Hi @neilharvey

Out of curiosity, do you know when you'll be able to publish a new version of the package including theses changes please :) ?

Thank you :)

neilharvey commented 3 months ago

Hey, sorry meant to push it over the weekend but it slipped my mind :)

I've pushed the package to NuGet, it should show up shortly. Thanks again for the contribution!

MaximeLaffaire commented 3 months ago

Hi again @neilharvey

I think you might have pushed the wrong version, seems like the last changes are not included 🥲 I don't see the changes when I decompile the .dll and seems like the version bundled in the nuget package is the 4.4.1

neilharvey commented 3 months ago

Huh, that's weird. This is what I get for skipping a prerelease package ;)

Looking through my logs I can see what's happened - whilst a build was run prior to packing, the configuration wasn't set so it defaulted to Debug, whereas the pack defaulted to Release so it pulled the old version. I'll sort it and republish a new version.

neilharvey commented 3 months ago

New version has been published, it should be available shortly. I'll leave the issue open until we've confirmed it's OK.

MaximeLaffaire commented 3 months ago

Looks like it's all good now :D Thank you for your time :)