richardlehane / siegfried

signature-based file format identification
http://www.itforarchivists.com/siegfried
Apache License 2.0
224 stars 30 forks source link

Siegfried blocks on unresolved Alias (Mac OS X) #107

Closed hvanstappen closed 6 years ago

hvanstappen commented 6 years ago

Hi,

I have to identify the entire contents of an internal HDD (HFS extended), but Siegfried seems to block every time it meets aliases (shortcut) that can't be resolved: "[ERROR] open /Volumes/CK6_files/Users/christiankieckens/Library/Application Support/SyncServices/Local/clientdata/633a1ba25cb8241bbde44acb603ee1e822cde772/004100420053004c00610073007400530079006e00630044006100740065: no such file or directory"

It seems to be related with the fact that these aliases are self-referring. What puzzles me more is that SF doesn't block on the same file in every run: it may accept the first, but then blocks with the next self-referring alias.

Is there a way to prevent SF from trying to resolve aliases?

thanks,

Henk

richardlehane commented 6 years ago

Thanks for this report Henk. I will take a look soon Cheers Richard ps very sorry to mispell your name in original comment

richardlehane commented 6 years ago

Hi Henk I've had a look at this today. Have managed to create some synthetic tests on my windows laptop using symlinked files and directories. Symlinked files work on windows but attempting to scan a symlinked directory in sf currently causes it to panic.

A proposed fix for your issue, but also the general issues that arise in scanning non-regular files (symlinks, sockets, named pipes etc.), is to change the file walk function so that only regular files are scanned. Golang has an IsRegular function (https://golang.org/pkg/os/#FileMode.IsRegular) that I could easily drop in.

When non-regular files are encountered I could include a generic error message in results so users are aware that those files have been skipped.

Given that this changes current behaviour (so for example symlinked files are currently scanned successfully by sf) I'd be interested in any feedback on this fix before it is implemented.

cheers Richard

hvanstappen commented 6 years ago

Hi Richard,

I do not know anything about Golan, but yes, I guess this is the right approach. If I understand you correctly, this would add an option telling SF to ignore anything but a regular file, right? Anyway, in my usage of SF, these non regular files are not very important.

thanks again,

henk

-- Henk Vanstappen e: henk@datable.be m: +32 498 59 68 55 s: henkvanstappen

https://www.linkedin.com/in/henkvanstappen https://www.linkedin.com/in/henkvanstappen

On 21 Sep 2017, at 06:51, Richard Lehane notifications@github.com wrote:

Hi Henk I've had a look at this today. Have managed to create some synthetic tests on my windows laptops using symlinks. Attempting to scan a symlinked directory in sf currently causes it to panic.

A proposed fix for your issue, but also the general issues in scanning non-regular files (symlinks, sockets, named pipes etc.), is to change the file walk function so only regular files are scanned. Golang has an IsRegular function (https://golang.org/pkg/os/#FileMode.IsRegular https://golang.org/pkg/os/#FileMode.IsRegular) that I could easily drop in. When non-regular files are encountered I could include a generic error message in results so users are aware that those files have been skipped.

Given that this changes current behaviour (so for example symlinked files are currently scanned successfully by sf) I'd be interested in any feedback on this fix before it is implemented.

cheers Richard

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/richardlehane/siegfried/issues/107#issuecomment-331050510, or mute the thread https://github.com/notifications/unsubscribe-auth/ANsfKlZxio1sylwtYS2ld5b7nGgaDFDzks5sketsgaJpZM4Pdd2e.

tw4l commented 6 years ago

Hi Richard and Henk,

I agree with Henk that having an option in Siegfried to scan only regular files makes sense. I would go so far as to say that for my use cases, it might even make sense to make this the default behavior and add to add a flag for scanning all (including non-regular) files. That is more or less what we've been doing in-house at CCA when making extent statements from DFXML manifests.

Looking forward to hearing others' opinions and use cases.

Apologies for the many edits: that's what I get for writing this before I've finished my morning coffee :)

Cheers, Tim

richardlehane commented 6 years ago

thanks Tim and Henk - yes agree that changing default to regular file scanning best approach. Will get a fix out soon

richardlehane commented 6 years ago

This bug should now be resolved in latest release (sf 1.7.6). Please let me know if there are any further issues.

I have changed the default behaviour so that only "regular" files are now scanned (no symlinks, sockets, devices etc. - all these now report errors instead).

It may be worth adding back in the ability to scan symlinks in future through an optional flag (e.g. like how you can optionally follow symlinks with the file command - they have a flag for this). I haven't added this yet as unclear how this would work for symlinked directories (which currently cause sf to panic). But happy to explore if any demand for this.

Thanks again for the report Henk and thanks Tim for your input too

hvanstappen commented 6 years ago

Hi Richrad,

Great, thanks. For now, I can see no use case for following symlinks. I'll try to find time to test against one of the disks with which I had the problems, but it may take a few weeks I'm afraid.

best,

Henk

Henk Vanstappen Lange Winkelhaakstraat 26, 2060 Antwerpen e: henk@datable.be m: +32 498 59 68 55 s: henkvanstappen w: datable.be

https://www.linkedin.com/in/henkvanstappen https://www.linkedin.com/in/henkvanstappen

On 4 Oct 2017, at 13:40, Richard Lehane notifications@github.com wrote:

This bug should now be resolved in latest release (sf 1.7.6). Please let me know if there are any further issues.

I have changed the default behaviour so that only "regular" files are now scanned (no symlinks, sockets, devices etc. - all these now report errors instead).

It may be worth adding back in the ability to scan symlinks in future through an optional flag (e.g. like how you can optionally follow symlinks with the file command - they have a flag for this). I haven't added this yet as unclear how this would work for symlinked directories (which currently cause sf to panic). But happy to explore if any demand for this.

Thanks again for the report Henk and thanks Tim for your input too

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/richardlehane/siegfried/issues/107#issuecomment-334128810, or mute the thread https://github.com/notifications/unsubscribe-auth/ANsfKgMWGgkTGMkv_dRkIjUZia1ATQ1xks5so26egaJpZM4Pdd2e.