idealo / imagededup

😎 Finding duplicate images made easy!
https://idealo.github.io/imagededup/
Apache License 2.0
5.09k stars 458 forks source link

Phash dont work #214

Open RMobile17 opened 6 months ago

RMobile17 commented 6 months ago

my code: for image_dir in image_dir_list:

Überprüfe, ob das image_dir existiert

    if not image_dir.exists() or not image_dir.is_dir():
        print(f"Das Verzeichnis {image_dir} existiert nicht oder ist kein Verzeichnis.")
        continue

    # Erstelle den "remove"-Unterordner
    remove_dir = image_dir / "remove"
    # Überprüfe, ob der Unterordner "remove" existiert, sonst erstelle ihn
    if not remove_dir.exists():
        remove_dir.mkdir()

    phasher = PHash()

    # Find duplicates using the generated encodings
    duplicates = phasher.find_duplicates(image_dir=image_dir)

error: 2024-03-13 11:04:10,376: INFO Start: Calculating hashes...

0%| | 0/2 [00:00<?, ?it/s]2024-03-13 11:04:17,839: INFO Start: Calculating hashes... 2024-03-13 11:04:17,844: INFO Start: Calculating hashes... Traceback (most recent call last): File "", line 1, in File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 125, in _main prepare(preparation_data) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path(main_path, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 289, in run_path return _run_module_code(code, init_globals, run_name, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 96, in _run_module_code _run_code(code, mod_globals, init_globals, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\rr004\eclipse-workspace-2023\ParseHTML\duplicate_phash.py", line 59, in move_duplicates_to_remove(image_dir_list) File "C:\Users\rr004\eclipse-workspace-2023\ParseHTML\duplicate_phash.py", line 22, in move_duplicates_to_remove duplicates = phasher.find_duplicates(image_dir=image_dir) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\methods\hashing.py", line 303, in find_duplicates result = self._find_duplicates_dir( File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\methods\hashing.py", line 363, in _find_duplicates_dir encoding_map = self.encode_images(image_dir, recursive=recursive, num_enc_workers=num_enc_workers) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\methods\hashing.py", line 161, in encode_images hashes = parallelise(function=self.encode_image, data=files, verbose=self.verbose, num_workers=num_enc_workers) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\utils\general_utils.py", line 65, in parallelise pool = Pool(processes=num_workers) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 119, in Pool return Pool(processes, initializer, initargs, maxtasksperchild, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 215, in init self._repopulate_pool() File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 306, in _repopulate_pool return self._repopulate_pool_static(self._ctx, self.Process, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 329, in _repopulate_pool_static 2024-03-13 11:04:17,852: INFO Start: Calculating hashes... w.start() File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\process.py", line 121, in start self._popen = self._Popen(self) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 336, in _Popen return Popen(process_obj) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\popen_spawn_win32.py", line 45, in init prep_data = spawn.get_preparation_data(process_obj._name) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 154, in get_preparation_data _check_not_importing_main() File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main Traceback (most recent call last): File "", line 1, in raise RuntimeError(''' RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.  File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 116, in spawn_main

exitcode = _main(fd, parent_sentinel)

File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 125, in _main prepare(preparation_data) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path(main_path, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 289, in run_path return _run_module_code(code, init_globals, run_name, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 96, in _run_module_code _run_code(code, mod_globals, init_globals, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\rr004\eclipse-workspace-2023\ParseHTML\duplicate_phash.py", line 59, in move_duplicates_to_remove(image_dir_list) File "C:\Users\rr004\eclipse-workspace-2023\ParseHTML\duplicate_phash.py", line 22, in move_duplicates_to_remove duplicates = phasher.find_duplicates(image_dir=image_dir) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\methods\hashing.py", line 303, in find_duplicates result = self._find_duplicates_dir( File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\methods\hashing.py", line 363, in _find_duplicates_dir encoding_map = self.encode_images(image_dir, recursive=recursive, num_enc_workers=num_enc_workers) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\methods\hashing.py", line 161, in encode_images hashes = parallelise(function=self.encode_image, data=files, verbose=self.verbose, num_workers=num_enc_workers) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\utils\general_utils.py", line 65, in parallelise pool = Pool(processes=num_workers) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 119, in Pool return Pool(processes, initializer, initargs, maxtasksperchild, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 215, in init self._repopulate_pool() File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 306, in _repopulate_pool return self._repopulate_pool_static(self._ctx, self.Process, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 329, in _repopulate_pool_static w.start() File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\process.py", line 121, in start self._popen = self._Popen(self) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 336, in _Popen return Popen(process_obj) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\popen_spawn_win32.py", line 45, in init prep_data = spawn.get_preparation_data(process_obj._name) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 154, in get_preparation_data _check_not_importing_main() File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main raise RuntimeError(''' RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Traceback (most recent call last): File "", line 1, in 2024-03-13 11:04:17,864: INFO Start: Calculating hashes... File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 125, in _main prepare(preparation_data) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path(main_path, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 289, in run_path return _run_module_code(code, init_globals, run_name, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 96, in _run_module_code _run_code(code, mod_globals, init_globals, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\rr004\eclipse-workspace-2023\ParseHTML\duplicate_phash.py", line 59, in move_duplicates_to_remove(image_dir_list) File "C:\Users\rr004\eclipse-workspace-2023\ParseHTML\duplicate_phash.py", line 22, in move_duplicates_to_remove duplicates = phasher.find_duplicates(image_dir=image_dir) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\methods\hashing.py", line 303, in find_duplicates Traceback (most recent call last): File "", line 1, in result = self._find_duplicates_dir( File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\methods\hashing.py", line 363, in _find_duplicates_dir 2024-03-13 11:04:17,914: INFO Start: Calculating hashes... encoding_map = self.encode_images(image_dir, recursive=recursive, num_enc_workers=num_enc_workers) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\methods\hashing.py", line 161, in encode_images hashes = parallelise(function=self.encode_image, data=files, verbose=self.verbose, num_workers=num_enc_workers) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\utils\general_utils.py", line 65, in parallelise File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 116, in spawn_main pool = Pool(processes=num_workers) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 119, in Pool exitcode = _main(fd, parent_sentinel) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 125, in _main return Pool(processes, initializer, initargs, maxtasksperchild, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 215, in init prepare(preparation_data) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 236, in prepare self._repopulate_pool() File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 306, in _repopulate_pool _fixup_main_from_path(data['init_main_from_path']) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path return self._repopulate_pool_static(self._ctx, self.Process, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 329, in _repopulate_pool_static main_content = runpy.run_path(main_path, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 289, in run_path w.start() File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\process.py", line 121, in start return _run_module_code(code, init_globals, run_name, self._popen = self._Popen(self) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 336, in _Popen return Popen(process_obj) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\popen_spawn_win32.py", line 45, in init prep_data = spawn.get_preparation_data(process_obj._name) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 154, in get_preparation_data _check_not_importing_main() File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main raise RuntimeError(''' RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 96, in _run_module_code _run_code(code, mod_globals, init_globals, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\rr004\eclipse-workspace-2023\ParseHTML\duplicate_phash.py", line 59, in move_duplicates_to_remove(image_dir_list) File "C:\Users\rr004\eclipse-workspace-2023\ParseHTML\duplicate_phash.py", line 22, in move_duplicates_to_remove duplicates = phasher.find_duplicates(image_dir=image_dir) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\methods\hashing.py", line 303, in find_duplicates result = self._find_duplicates_dir( File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\methods\hashing.py", line 363, in _find_duplicates_dir encoding_map = self.encode_images(image_dir, recursive=recursive, num_enc_workers=num_enc_workers) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\methods\hashing.py", line 161, in encode_images hashes = parallelise(function=self.encode_image, data=files, verbose=self.verbose, num_workers=num_enc_workers) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\utils\general_utils.py", line 65, in parallelise pool = Pool(processes=num_workers) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 119, in Pool return Pool(processes, initializer, initargs, maxtasksperchild, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 215, in init self._repopulate_pool() File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 306, in _repopulate_pool return self._repopulate_pool_static(self._ctx, self.Process, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 329, in _repopulate_pool_static w.start() File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\process.py", line 121, in start self._popen = self._Popen(self) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 336, in _Popen return Popen(process_obj) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\popen_spawn_win32.py", line 45, in init prep_data = spawn.get_preparation_data(process_obj._name) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 154, in get_preparation_data _check_not_importing_main() File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main raise RuntimeError(''' RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Traceback (most recent call last): File "", line 1, in File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 125, in _main prepare(preparation_data) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path(main_path, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 289, in run_path return _run_module_code(code, init_globals, run_name, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 96, in _run_module_code _run_code(code, mod_globals, init_globals, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\rr004\eclipse-workspace-2023\ParseHTML\duplicate_phash.py", line 59, in move_duplicates_to_remove(image_dir_list) File "C:\Users\rr004\eclipse-workspace-2023\ParseHTML\duplicate_phash.py", line 22, in move_duplicates_to_remove duplicates = phasher.find_duplicates(image_dir=image_dir) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\methods\hashing.py", line 303, in find_duplicates result = self._find_duplicates_dir( File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\methods\hashing.py", line 363, in _find_duplicates_dir encoding_map = self.encode_images(image_dir, recursive=recursive, num_enc_workers=num_enc_workers) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\methods\hashing.py", line 161, in encode_images hashes = parallelise(function=self.encode_image, data=files, verbose=self.verbose, num_workers=num_enc_workers) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\utils\general_utils.py", line 65, in parallelise pool = Pool(processes=num_workers) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 119, in Pool return Pool(processes, initializer, initargs, maxtasksperchild, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 215, in init self._repopulate_pool() File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 306, in _repopulate_pool return self._repopulate_pool_static(self._ctx, self.Process, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 329, in _repopulate_pool_static w.start() File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\process.py", line 121, in start self._popen = self._Popen(self) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 336, in _Popen return Popen(process_obj) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\popen_spawn_win32.py", line 45, in init prep_data = spawn.get_preparation_data(process_obj._name) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 154, in get_preparation_data _check_not_importing_main() File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main raise RuntimeError(''' RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

2024-03-13 11:04:18,412: INFO Start: Calculating hashes... Traceback (most recent call last): File "", line 1, in File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 125, in _main prepare(preparation_data) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path(main_path, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 289, in run_path return _run_module_code(code, init_globals, run_name, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 96, in _run_module_code _run_code(code, mod_globals, init_globals, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\rr004\eclipse-workspace-2023\ParseHTML\duplicate_phash.py", line 59, in move_duplicates_to_remove(image_dir_list) File "C:\Users\rr004\eclipse-workspace-2023\ParseHTML\duplicate_phash.py", line 22, in move_duplicates_to_remove duplicates = phasher.find_duplicates(image_dir=image_dir) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\methods\hashing.py", line 303, in find_duplicates result = self._find_duplicates_dir( File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\methods\hashing.py", line 363, in _find_duplicates_dir encoding_map = self.encode_images(image_dir, recursive=recursive, num_enc_workers=num_enc_workers) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\methods\hashing.py", line 161, in encode_images hashes = parallelise(function=self.encode_image, data=files, verbose=self.verbose, num_workers=num_enc_workers) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\utils\general_utils.py", line 65, in parallelise pool = Pool(processes=num_workers) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 119, in Pool return Pool(processes, initializer, initargs, maxtasksperchild, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 215, in init self._repopulate_pool() File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 306, in _repopulate_pool return self._repopulate_pool_static(self._ctx, self.Process, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 329, in _repopulate_pool_static w.start() File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\process.py", line 121, in start self._popen = self._Popen(self) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 336, in _Popen return Popen(process_obj) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\popen_spawn_win32.py", line 45, in init prep_data = spawn.get_preparation_data(process_obj._name) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 154, in get_preparation_data _check_not_importing_main() File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main raise RuntimeError(''' RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

2024-03-13 11:04:18,520: INFO Start: Calculating hashes... Traceback (most recent call last): File "", line 1, in File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 125, in _main prepare(preparation_data) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path(main_path, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 289, in run_path return _run_module_code(code, init_globals, run_name, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 96, in _run_module_code _run_code(code, mod_globals, init_globals, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\rr004\eclipse-workspace-2023\ParseHTML\duplicate_phash.py", line 59, in move_duplicates_to_remove(image_dir_list) File "C:\Users\rr004\eclipse-workspace-2023\ParseHTML\duplicate_phash.py", line 22, in move_duplicates_to_remove duplicates = phasher.find_duplicates(image_dir=image_dir) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\methods\hashing.py", line 303, in find_duplicates result = self._find_duplicates_dir( File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\methods\hashing.py", line 363, in _find_duplicates_dir encoding_map = self.encode_images(image_dir, recursive=recursive, num_enc_workers=num_enc_workers) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\methods\hashing.py", line 161, in encode_images hashes = parallelise(function=self.encode_image, data=files, verbose=self.verbose, num_workers=num_enc_workers) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\utils\general_utils.py", line 65, in parallelise pool = Pool(processes=num_workers) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 119, in Pool return Pool(processes, initializer, initargs, maxtasksperchild, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 215, in init self._repopulate_pool() File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 306, in _repopulate_pool return self._repopulate_pool_static(self._ctx, self.Process, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 329, in _repopulate_pool_static w.start() File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\process.py", line 121, in start self._popen = self._Popen(self) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 336, in _Popen return Popen(process_obj) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\popen_spawn_win32.py", line 45, in init prep_data = spawn.get_preparation_data(process_obj._name) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 154, in get_preparation_data _check_not_importing_main() File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main raise RuntimeError(''' RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

2024-03-13 11:04:18,589: INFO Start: Calculating hashes... Traceback (most recent call last): File "", line 1, in File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 125, in _main prepare(preparation_data) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path(main_path, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 289, in run_path return _run_module_code(code, init_globals, run_name, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 96, in _run_module_code _run_code(code, mod_globals, init_globals, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\rr004\eclipse-workspace-2023\ParseHTML\duplicate_phash.py", line 59, in move_duplicates_to_remove(image_dir_list) File "C:\Users\rr004\eclipse-workspace-2023\ParseHTML\duplicate_phash.py", line 22, in move_duplicates_to_remove duplicates = phasher.find_duplicates(image_dir=image_dir) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\methods\hashing.py", line 303, in find_duplicates result = self._find_duplicates_dir( File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\methods\hashing.py", line 363, in _find_duplicates_dir encoding_map = self.encode_images(image_dir, recursive=recursive, num_enc_workers=num_enc_workers) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\methods\hashing.py", line 161, in encode_images hashes = parallelise(function=self.encode_image, data=files, verbose=self.verbose, num_workers=num_enc_workers) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\site-packages\imagededup\utils\general_utils.py", line 65, in parallelise pool = Pool(processes=num_workers) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 119, in Pool return Pool(processes, initializer, initargs, maxtasksperchild, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 215, in init self._repopulate_pool() File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 306, in _repopulate_pool return self._repopulate_pool_static(self._ctx, self.Process, File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 329, in _repopulate_pool_static w.start() File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\process.py", line 121, in start self._popen = self._Popen(self) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 336, in _Popen return Popen(process_obj) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\popen_spawn_win32.py", line 45, in init prep_data = spawn.get_preparation_data(process_obj._name) File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 154, in get_preparation_data _check_not_importing_main() File "C:\Users\rr004\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main raise RuntimeError(''' RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.
SWHL commented 6 months ago

I met the same problem. Python: 3.10.13 imagededup: 0.3.1 OS: macOS

RMobile17 commented 6 months ago

I have windows 10

oohtmeel1 commented 5 months ago

Not sure what happened but these are the image matches and then what came up in the directory. 'cat (10023).jpg': ['cat (413).jpg'],

cat (10023)

cat (413)

ltskinner commented 3 months ago

The solution for me:

from imagededup.methods import PHash

#     vv this is the solution vv
if __name__ == '__main__':
    # ^^ this is the solution ^^

    phasher = PHash()
    encodings = phasher.encode_images(
        image_dir='path/to/image/directory',
        num_enc_workers=0  # https://github.com/idealo/imagededup/blob/master/imagededup/methods/hashing.py#L141C171-L142C1
    )
    ...

It has to do with the multiprocessing happening under the hood