qarmin / czkawka

Multi functional app to find duplicates, empty folders, similar images etc.
Other
18.32k stars 603 forks source link

[Feature request] Load/Import saved duplicate files list into Czkawka #1295

Open AndroYD84 opened 2 weeks ago

AndroYD84 commented 2 weeks ago

Feature Description To avoid redundant scans, I suggest a feature for loading/importing saved lists as-is without re-comparing hashes, at most just checking if those files still exist physically or as a symlink.

I spent a week to scan all my drives, after identifying all duplicate files, I saved them in a list, but I cannot import them back into Czkawka. So I lost all progress due to a crash and I'm forced to repeat the scan again, the cached .bin hashes are not helping because it's still going to recompare all hashes all over again, taking days. It's genuinely dowright depressing to lose a week time of progress in a whim and having the constant fear of this happening again.

AndroYD84 commented 1 week ago

Made this python script to convert a saved duplicate finder result from Czkawka into a Dupeguru file. So you can keep working on Dupeguru if Czkawka crashes without wasting time rescanning the drives again from scratch. Dupeguru also allows to choose which file you want to keep as the original (Right click > "Mark Selected into Reference") when you symlink groups ("Actions > "Send Marked to Recycle Bin" > "Link deleted files" > "Symlink") as requested here https://github.com/qarmin/czkawka/issues/903 and https://github.com/qarmin/czkawka/issues/149

import json
import xml.etree.ElementTree as ET

def convert_json_to_xml(json_file, xml_file):
    # Read JSON data from the input file
    with open(json_file, 'r', encoding='utf-8') as f:
        data = json.load(f)

    # Create the root element of the XML document
    results = ET.Element("results")

    # Iterate over the data and create XML structure
    for size_group in data.values():
        for group in size_group:
            group_element = ET.SubElement(results, "group")
            for file in group:
                file_element = ET.SubElement(group_element, "file")
                file_element.set("path", file["path"])
                file_element.set("words", "")
                file_element.set("is_ref", "n")
                file_element.set("marked", "n")

    # Create an ElementTree object and write it to the XML file
    tree = ET.ElementTree(results)
    tree.write(xml_file, encoding='utf-8', xml_declaration=True)

# Convert JSON to XML
convert_json_to_xml('czkawka_duplicates.json', 'dupeguru_duplicates.dupeguru')