markfasheh / duperemove

Tools for deduping file systems
GNU General Public License v2.0
794 stars 78 forks source link

Handle colliding extents #252

Closed lorddoskias closed 3 years ago

lorddoskias commented 3 years ago

Currently if we have 2 extents which have the same hash but different size it will trigger an abort in dedupe_extent_list. The reason is that the query used to get dupe extents in dbfile_load_extent_hashes ignores the len of the extent and instead only matches on digest. This would result in creating 2 distinct struct dupe_extents with only a single entry, this in turn results in triggering an assertion in dedupe_extent_list.

Fix this by changing to the SQL statement responsible for loading dupes to consider the 'len' column as well.