visual-layer / fastdup

fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data operations costs at an unparalleled scale.
Other
1.57k stars 76 forks source link

[Bug]: Installation errors on Sonoma 14.5 with Python 3.9 #341

Open dbickson opened 1 month ago

dbickson commented 1 month ago

What happened?

Python 3.9.19 (main, Mar 19 2024, 16:08:27) 
[Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import fastdup
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/vl/Library/Python/3.9/lib/python/site-packages/fastdup/__init__.py", line 25, in <module>
    import pandas as pd
  File "/Users/vl/Library/Python/3.9/lib/python/site-packages/pandas/__init__.py", line 22, in <module>
    from pandas.compat import is_numpy_dev as _is_numpy_dev  # pyright: ignore # noqa:F401
  File "/Users/vl/Library/Python/3.9/lib/python/site-packages/pandas/compat/__init__.py", line 25, in <module>
    from pandas.compat.numpy import (
  File "/Users/vl/Library/Python/3.9/lib/python/site-packages/pandas/compat/numpy/__init__.py", line 4, in <module>
    from pandas.util.version import Version
  File "/Users/vl/Library/Python/3.9/lib/python/site-packages/pandas/util/__init__.py", line 2, in <module>
    from pandas.util._decorators import (  # noqa:F401
  File "/Users/vl/Library/Python/3.9/lib/python/site-packages/pandas/util/_decorators.py", line 14, in <module>
    from pandas._libs.properties import cache_readonly
  File "/Users/vl/Library/Python/3.9/lib/python/site-packages/pandas/_libs/__init__.py", line 13, in <module>
    from pandas._libs.interval import Interval
  File "pandas/_libs/interval.pyx", line 1, in init pandas._libs.interval
  File "pandas/_libs/hashtable.pyx", line 1, in init pandas._libs.hashtable
  File "pandas/_libs/missing.pyx", line 1, in init pandas._libs.missing
  File "/Users/israelvinitzer/Library/Python/3.9/lib/python/site-packages/pandas/_libs/tslibs/__init__.py", line 39, in <module>
    from pandas._libs.tslibs.conversion import localize_pydatetime
  File "pandas/_libs/tslibs/conversion.pyx", line 1, in init pandas._libs.tslibs.conversion
  File "pandas/_libs/tslibs/offsets.pyx", line 1, in init pandas._libs.tslibs.offsets
  File "pandas/_libs/tslibs/timestamps.pyx", line 1, in init pandas._libs.tslibs.timestamps
  File "pandas/_libs/tslibs/timedeltas.pyx", line 1, in init pandas._libs.tslibs.timedeltas
  File "pandas/_libs/tslibs/timezones.pyx", line 24, in init pandas._libs.tslibs.timezones
  File "/Users/vl/Library/Python/3.9/lib/python/site-packages/dateutil/tz/__init__.py", line 2, in <module>
    from .tz import *
  File "/Users/vl/Library/Python/3.9/lib/python/site-packages/dateutil/tz/tz.py", line 19, in <module>
    import six
ModuleNotFoundError: No module named 'six'
>>> 

After changing the numpy to 2.0,

raceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/vl/Library/Python/3.9/lib/python/site-packages/fastdup/__init__.py", line 25, in <module>
    import pandas as pd
  File "/Users/vl/Library/Python/3.9/lib/python/site-packages/pandas/__init__.py", line 22, in <module>
    from pandas.compat import is_numpy_dev as _is_numpy_dev  # pyright: ignore # noqa:F401
  File "/Users/vl/Library/Python/3.9/lib/python/site-packages/pandas/compat/__init__.py", line 25, in <module>
    from pandas.compat.numpy import (
  File "/Users/vl/Library/Python/3.9/lib/python/site-packages/pandas/compat/numpy/__init__.py", line 4, in <module>
    from pandas.util.version import Version
  File "/Users/vl/Library/Python/3.9/lib/python/site-packages/pandas/util/__init__.py", line 2, in <module>
    from pandas.util._decorators import (  # noqa:F401
  File "/Users/vl/Library/Python/3.9/lib/python/site-packages/pandas/util/_decorators.py", line 14, in <module>
    from pandas._libs.properties import cache_readonly
  File "/Users/vl/Library/Python/3.9/lib/python/site-packages/pandas/_libs/__init__.py", line 13, in <module>
    from pandas._libs.interval import Interval
  File "pandas/_libs/interval.pyx", line 1, in init pandas._libs.interval
ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
>>> 

What did you expect to see?

No response

What version of fastdup were you runnning on?

2.7

What version of Python were you running on?

Python 3.9

Operating System

Sonoma 14.5

Reproduction steps

A workaround is using Python3.10 which works fine.

Relevant log output

No response

Attach a screenshot [Optional]

No response

Contact Details [Optional]

No response

KevinColemanInc commented 1 month ago

While I had errors with 3.9 on Sonoma 14.6, I didn't get that error.

RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

Also, don't put the working directory in the same directory as the photos, otherwise there will be locking errors.

dbickson commented 1 month ago

Hi @KevinColemanInc can you try upgrading to python3.10 and let us know if this works. Otherwise, please run with verbose=1 and send us the full stack trace. In addition we would love to have the locking errors you mention we did not see them before.

KevinColemanInc commented 1 month ago

I am getting really inconsistent results. Now when I do this:

import fastdup

fd = fastdup.create(
    work_dir="/Users/kevin/Desktop/work_dir",
    input_dir="/Users/kevin/Desktop/small.export",
)
fd.run(overwrite=True, verbose=True)
fd.explore()

I get this:

$ python app.py
Warning: fastdup create() without work_dir argument, output is stored in a folder named work_dir in your current working path.

fastdup By Visual Layer, Inc. 2024. All rights reserved.

A fastdup dataset object was created!

Input directory is set to "/Users/kevin/Desktop/small.export"
Work directory is set to "work_dir"

The next steps are:
   1. Analyze your dataset with the .run() function of the dataset object
   2. Interactively explore your data on your local machine with the .explore() function of the dataset object

For more information, use help(fastdup) or check our documentation https://docs.visual-layer.com/docs/getting-started-with-fastdup.

fastdup By Visual Layer, Inc. 2024. All rights reserved.
Building index:  33%|██████████████████████████████████████████▋                                                                                     | 1/3 [00:00<00:00,  7.46it/s]fastdup By Visual Layer, Inc. 2024. All rights reserved.
Creating visualization:  67%|████████████████████████████████████████████████████████████████████████████████                                        | 2/3 [00:01<00:01,  1.05s/it]
fastdup By Visual Layer, Inc. 2024. All rights reserved.

A fastdup dataset object was created!

Input directory is set to "."
Work directory is set to "/Users/kevin/Desktop/work_dir/processing/step_fastdup/images"

The next steps are:
   1. Analyze your dataset with the .run() function of the dataset object
   2. Interactively explore your data on your local machine with the .explore() function of the dataset object

For more information, use help(fastdup) or check our documentation https://docs.visual-layer.com/docs/getting-started-with-fastdup.

fastdup By Visual Layer, Inc. 2024. All rights reserved.

A fastdup dataset object was created!

Input directory is set to "."
Work directory is set to "/Users/kevin/Desktop/work_dir/processing/step_fastdup/images"

The next steps are:
   1. Analyze your dataset with the .run() function of the dataset object
   2. Interactively explore your data on your local machine with the .explore() function of the dataset object

For more information, use help(fastdup) or check our documentation https://docs.visual-layer.com/docs/getting-started-with-fastdup.

Done: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.35it/s]

Analysis complete. Use the .explore() function to interactively explore your data on your local machine.

Alternatively, you can generate HTML-based galleries.
For more information, use help(fastdup) or check our documentation https://docs.visual-layer.com/docs/getting-started-with-fastdup.

Warning: fastdup create() without work_dir argument, output is stored in a folder named work_dir in your current working path.

fastdup By Visual Layer, Inc. 2024. All rights reserved.

A fastdup dataset object was created!

Input directory is set to "/Users/kevin/Desktop/small.export"
Work directory is set to "work_dir"

The next steps are:
   1. Analyze your dataset with the .run() function of the dataset object
   2. Interactively explore your data on your local machine with the .explore() function of the dataset object

For more information, use help(fastdup) or check our documentation https://docs.visual-layer.com/docs/getting-started-with-fastdup.

fastdup By Visual Layer, Inc. 2024. All rights reserved.
Building index:   0%|                                                                                                                                        | 0/3 [00:00<?, ?it/s]fastdup By Visual Layer, Inc. 2024. All rights reserved.
Creating visualization:  67%|████████████████████████████████████████████████████████████████████████████████                                        | 2/3 [00:01<00:00,  1.16it/s]
fastdup By Visual Layer, Inc. 2024. All rights reserved.

A fastdup dataset object was created!

Input directory is set to "."
Work directory is set to "/Users/kevin/Desktop/work_dir/processing/step_fastdup/images"

The next steps are:
   1. Analyze your dataset with the .run() function of the dataset object
   2. Interactively explore your data on your local machine with the .explore() function of the dataset object

For more information, use help(fastdup) or check our documentation https://docs.visual-layer.com/docs/getting-started-with-fastdup.

fastdup By Visual Layer, Inc. 2024. All rights reserved.

A fastdup dataset object was created!

Input directory is set to "."
Work directory is set to "/Users/kevin/Desktop/work_dir/processing/step_fastdup/images"

The next steps are:
   1. Analyze your dataset with the .run() function of the dataset object
   2. Interactively explore your data on your local machine with the .explore() function of the dataset object

For more information, use help(fastdup) or check our documentation https://docs.visual-layer.com/docs/getting-started-with-fastdup.

Done: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.42it/s]

Analysis complete. Use the .explore() function to interactively explore your data on your local machine.

Alternatively, you can generate HTML-based galleries.
For more information, use help(fastdup) or check our documentation https://docs.visual-layer.com/docs/getting-started-with-fastdup.

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/homebrew/Caskroom/miniforge/base/envs/photo_dupe3.10/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/photo_dupe3.10/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
    prepare(preparation_data)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/photo_dupe3.10/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/opt/homebrew/Caskroom/miniforge/base/envs/photo_dupe3.10/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "/opt/homebrew/Caskroom/miniforge/base/envs/photo_dupe3.10/lib/python3.10/runpy.py", line 289, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/opt/homebrew/Caskroom/miniforge/base/envs/photo_dupe3.10/lib/python3.10/runpy.py", line 96, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/opt/homebrew/Caskroom/miniforge/base/envs/photo_dupe3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/kevin/Desktop/app.py", line 5, in <module>
    fd.explore()
  File "/opt/homebrew/Caskroom/miniforge/base/envs/photo_dupe3.10/lib/python3.10/site-packages/fastdup/fastdup_controller.py", line 652, in explore
    do_visual_layer(work_dir=self._work_dir, overwrite=False, run_server=True, verbose=verbose)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/photo_dupe3.10/lib/python3.10/site-packages/fastdup/fastdup_runner/run.py", line 194, in do_visual_layer
    raise e
  File "/opt/homebrew/Caskroom/miniforge/base/envs/photo_dupe3.10/lib/python3.10/site-packages/fastdup/fastdup_runner/run.py", line 185, in do_visual_layer
    launch_server(port)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/photo_dupe3.10/lib/python3.10/site-packages/fastdup/fastdup_runner/fastdup_runner_server/fastdup_runner_server_launcher.py", line 127, in launch_server
    args=(str(settings.Settings.DATASET_ID), port)).start()
  File "/opt/homebrew/Caskroom/miniforge/base/envs/photo_dupe3.10/lib/python3.10/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/photo_dupe3.10/lib/python3.10/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/photo_dupe3.10/lib/python3.10/multiprocessing/context.py", line 288, in _Popen
    return Popen(process_obj)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/photo_dupe3.10/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/photo_dupe3.10/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/photo_dupe3.10/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/photo_dupe3.10/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "/opt/homebrew/Caskroom/miniforge/base/envs/photo_dupe3.10/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
$ conda info
conda info

     active environment : photo_dupe3.10
    active env location : /opt/homebrew/Caskroom/miniforge/base/envs/photo_dupe3.10
            shell level : 2
       user config file : /Users/kevin/.condarc
 populated config files : /opt/homebrew/Caskroom/miniforge/base/.condarc
                          /Users/kevin/.condarc
          conda version : 23.3.1
    conda-build version : not installed
         python version : 3.10.9.final.0
       virtual packages : __archspec=1=arm64
                          __osx=14.6=0
                          __unix=0=0
       base environment : /opt/homebrew/Caskroom/miniforge/base  (writable)
      conda av data dir : /opt/homebrew/Caskroom/miniforge/base/etc/conda
  conda av metadata url : None
           channel URLs : https://repo.anaconda.com/pkgs/main/osx-arm64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/osx-arm64
                          https://repo.anaconda.com/pkgs/r/noarch
                          https://conda.anaconda.org/conda-forge/osx-arm64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : /opt/homebrew/Caskroom/miniforge/base/pkgs
                          /Users/kevin/.conda/pkgs
       envs directories : /opt/homebrew/Caskroom/miniforge/base/envs
                          /Users/kevin/.conda/envs
               platform : osx-arm64
             user-agent : conda/23.3.1 requests/2.28.1 CPython/3.10.9 Darwin/23.6.0 OSX/14.6
                UID:GID : 501:20
             netrc file : /Users/kevin/.netrc
           offline mode : False

But if I do this:

$ python
Python 3.10.14 (main, May  6 2024, 14:42:37) [Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import fastdup

fd = fastdup.create(
    work_dir="/Users/kevin/Desktop/work_dir",
    input_dir="/Users/kevin/Desktop/small.export",
)
fd.run(overwrite=True, verbose=True)
fd.explore()

>>>
>>> fd = fastdup.create(
...     work_dir="/Users/kevin/Desktop/work_dir",
...     input_dir="/Users/kevin/Desktop/small.export",
... )

it works for python3.9 and 3.10.

The larger dataset of images (11k) triggered the locking error before. let me see if I can still reproduce it. I will have to post those logs in the next 24hours tho.