sepinf-inc / IPED

IPED Digital Forensic Tool. It is an open source software that can be used to process and analyze digital evidence, often seized at crime scenes by law enforcement or in a corporate investigation by private examiners.
Other
884 stars 209 forks source link

Duplicate Filter is not working properly when combined with other filters (master only) #2251

Closed wladimirleite closed 2 days ago

wladimirleite commented 3 days ago

Working on a real case with 4.2-snapshot, I noticed that the duplicate filter is not working properly when combined with the bookmarks filter (not sure if it happens with other filters).

To reproduce, I processed a few identical text files, then I added one of them to a bookmark. When I select "[No Bookmarks]", no item is shown, instead of just one.

No duplicated filter: image

Duplicated filter only. This works fine: image

Duplicated filter + Bookmarks Filter. This is wrong, as one item should be displayed. image

It is working fine with 4.1.6. Playing with different bookmarks filter combinations, it seems that duplicate filter is not applied as the last filter, as it should be.

lfcnassif commented 3 days ago

Thanks @wladimirleite for reporting and providing a step by step!

Playing with different bookmarks filter combinations, it seems that duplicate filter is not applied as the last filter, as it should be.

Yes, it should be the last one. I think I fixed a similar behavior change when reviewing #1613 on commit b6cf7480d64763535b92ef767e0f5c002a099d16, maybe a later commit changed the behavior again. @patrickdalla, could you take a look at this?

patrickdalla commented 3 days ago

sure

Em qui., 27 de jun. de 2024, 17:56, Luis Filipe Nassif < @.***> escreveu:

Thanks @wladimirleite https://github.com/wladimirleite for reporting and providing a step by step!

Playing with different bookmarks filter combinations, it seems that duplicate filter is not applied as the last filter, as it should be.

Yes, it should be the last one. I think I fixed a similar behavior change when reviewing #1613 https://github.com/sepinf-inc/IPED/pull/1613 on commit b6cf748 https://github.com/sepinf-inc/IPED/commit/b6cf7480d64763535b92ef767e0f5c002a099d16, maybe a later commit changed the behavior again. @patrickdalla https://github.com/patrickdalla, could you take a look at this?

— Reply to this email directly, view it on GitHub https://github.com/sepinf-inc/IPED/issues/2251#issuecomment-2195652823, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG247S6QDJ5MCLSHXH3BARDZJR37LAVCNFSM6AAAAABKASWSHCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJVGY2TEOBSGM . You are receiving this because you were mentioned.Message ID: @.***>

patrickdalla commented 3 days ago

The correct duplicate filterer you have created, @lfcnassif , must be applied at the end of all filters, so it can keep at least one on the copies of the final result. I have corrected the code. @wladimirleite could you check and test? It was implemented in branch #39_complement.

patrickdalla commented 3 days ago

I think that of all existent filterers, only duplicate filterer must follow this rule of sequence of filtering. So this rule was "hard coded". @wladimirleite and @lfcnassif, do you foresee any necessity of creating (maybe as future enhancement) the ordering the filterers to be applied?

lfcnassif commented 3 days ago

Sorry, I think at least the MetadataFilter is applied after the duplicate filter in 4.1.6, so the duplicate filter is not really the last one.

patrickdalla commented 3 days ago

@lfcnassif do you remember why MetadataFilter should be the last? So I can emulate a case to check the result.

patrickdalla commented 3 days ago

The duplicate filter should be applied later because it is a kind of context filter, that applies based on the overall result that it receives, and not only on the data of each individual item of the result.

wladimirleite commented 3 days ago

I have corrected the code. @wladimirleite could you check and test? It was implemented in branch #39_complement.

It worked fine now. Thanks!

lfcnassif commented 3 days ago

@lfcnassif do you remember why MetadataFilter should be the last?

It was by design, since the goal was to show the exact number of results displayed on table (after all filters) but grouped by an arbitrary property chosen by the user.

The order which filters are applied on 4.1.x are in this class: https://github.com/sepinf-inc/IPED/blob/4.1.6/iped-app/src/main/java/iped/app/ui/CaseSearcherFilter.java

The filters order is like this: Query filters -> checked -> bookmarks -> graph -> duplicates -> similar images -> similar faces -> metadata panel

Similar images and faces filters are after the duplicates one to make them faster or to bring more different results.

We may rediscuss the filters order in the future, but for 4.2 I think we should keep current behavior to avoid confusing users.

lfcnassif commented 3 days ago

PS: @patrickdalla could you please submit a different PR with the fix? #2163 is not minor anymore and has lots of changes I'm concerned to include into 4.2.0...

patrickdalla commented 2 days ago

@lfcnassif do you remember why MetadataFilter should be the last?

It was by design, since the goal was to show the exact number of results displayed on table (after all filters) but grouped by an arbitrary property chosen by the user.

The order which filters are applied on 4.1.x are in this class: https://github.com/sepinf-inc/IPED/blob/4.1.6/iped-app/src/main/java/iped/app/ui/CaseSearcherFilter.java

The filters order is like this: Query filters -> checked -> bookmarks -> graph -> duplicates -> similar images -> similar faces -> metadata panel

Similar images and faces filters are after the duplicates one to make them faster or to bring more different results.

We may rediscuss the filters order in the future, but for 4.2 I think we should keep current behavior to avoid confusing users.

@lfcnassif, I checked again and actually it was already being ordered the way you informed. So the problem wans't really with the DuplicateFilterer in the wrong order, but a scheme of bitmap reuse (cache) I was using. I removed this scheme and the correct order was followed again.

I will organize the code in a new PR right after.

The order followed is the filterer addition order in filter manager, which is done in APP class createGUI method.

patrickdalla commented 2 days ago

@wladimirleite as reverted the commit to correct the real problem (wrong cache scheme), please review the execution again, please. This time the new code was pushed on the branch Filter_unions_bug_correction,.

wladimirleite commented 2 days ago

@wladimirleite as reverted the commit to correct the real problem (wrong cache scheme), please review the execution again, please. This time the new code was pushed on the branch Filter_unions_bug_correction,.

Just tested here and it worked fine. Thanks!

lfcnassif commented 2 days ago

@lfcnassif, I checked again and actually it was already being ordered the way you informed. So the problem wans't really with the DuplicateFilterer in the wrong order, but a scheme of bitmap reuse (cache) I was using. I removed this scheme and the correct order was followed again.

I will organize the code in a new PR right after.

Great! Thank you very much @patrickdalla for investigating this!

The order followed is the filterer addition order in filter manager, which is done in APP class createGUI method.

Fine! I remember I fixed the filters order in commit https://github.com/sepinf-inc/IPED/commit/b6cf7480d64763535b92ef767e0f5c002a099d16, thanks again for double checking!