Open wisebreadloaf opened 5 months ago
Encountering same problem with Python3.10 on Macbook M1 running with cpu.
Is source folder empty ? Did you try with an absolute path
You may need to --enable_text False if you don't have any captions
Thanks for answering. The source folder is not empty and I think it did read the folder as the output printed "The number of samples has been estimated to be ...". There is no captions but --enable_text False still gives the same error.
Did you try with an absolute path ?
Can you share a ls of the content?
What operating system are you using?
On Sun, Jan 28, 2024, 11:12 AM Nguyen Hoang @.***> wrote:
Thanks for answering. The source folder is not empty and I think it did read the folder as the output printed "The number of samples has been estimated to be ...". There is no captions but --enable_text False still gives the same error.
— Reply to this email directly, view it on GitHub https://github.com/rom1504/clip-retrieval/issues/345#issuecomment-1913543071, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437XYQLRFCHZCN3PR56DYQYP7VAVCNFSM6AAAAABCFPG3OWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJTGU2DGMBXGE . You are receiving this because you commented.Message ID: @.***>
I'm using macOS Sonoma. The image_folder was created using img2dataset, and contains:
00000 00000.parquet 00000_stats.json
Trying absolute path gives same error, except the dataset size is different:
clip-retrieval inference --input_dataset image_folder --output_folder embeddings_folder --enable_text False
The number of samples has been estimated to be 124
Starting the worker
dataset is 12
Starting work on task 0
warming up with batch size 256 on cpu
done warming up in 17.178229808807373s
Traceback (most recent call last):
File "/opt/homebrew/bin/clip-retrieval", line 8, in <module>
sys.exit(main())
File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/cli.py", line 18, in main
fire.Fire(
File "/opt/homebrew/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/homebrew/lib/python3.10/site-packages/fire/core.py", line 466, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/opt/homebrew/lib/python3.10/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/clip_inference/main.py", line 154, in main
distributor()
File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/clip_inference/distributor.py", line 17, in __call__
worker(
File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/clip_inference/worker.py", line 125, in worker
runner(task)
File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/clip_inference/runner.py", line 39, in __call__
batch = iterator.__next__()
File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/clip_inference/reader.py", line 222, in __iter__
for batch in self.dataloader:
File "/opt/homebrew/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 435, in __iter__
return self._get_iterator()
File "/opt/homebrew/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 381, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "/opt/homebrew/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1034, in __init__
w.start()
File "/opt/homebrew/Cellar/python@3.10/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/opt/homebrew/Cellar/python@3.10/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/opt/homebrew/Cellar/python@3.10/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/context.py", line 288, in _Popen
return Popen(process_obj)
File "/opt/homebrew/Cellar/python@3.10/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/opt/homebrew/Cellar/python@3.10/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/opt/homebrew/Cellar/python@3.10/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/opt/homebrew/Cellar/python@3.10/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'get_image_dataset.<locals>.ImageDataset'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/opt/homebrew/Cellar/python@3.10/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/opt/homebrew/Cellar/python@3.10/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
File "/opt/homebrew/Cellar/python@3.10/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/synchronize.py", line 110, in __setstate__
self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory
With absolute path:
clip-retrieval inference --input_dataset /Users/hknguyen20/image_folder --output_folder embedding
s_folder --enable_text False
The number of samples has been estimated to be 124
Starting the worker
dataset is 30
Starting work on task 0
warming up with batch size 256 on cpu
done warming up in 16.550618886947632s
Traceback (most recent call last):
File "/opt/homebrew/bin/clip-retrieval", line 8, in <module>
sys.exit(main())
File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/cli.py", line 18, in main
fire.Fire(
File "/opt/homebrew/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/homebrew/lib/python3.10/site-packages/fire/core.py", line 466, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/opt/homebrew/lib/python3.10/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/clip_inference/main.py", line 154, in main
distributor()
File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/clip_inference/distributor.py", line 17, in __call__
worker(
File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/clip_inference/worker.py", line 125, in worker
runner(task)
File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/clip_inference/runner.py", line 39, in __call__
batch = iterator.__next__()
File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/clip_inference/reader.py", line 222, in __iter__
for batch in self.dataloader:
File "/opt/homebrew/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 435, in __iter__
return self._get_iterator()
File "/opt/homebrew/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 381, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "/opt/homebrew/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1034, in __init__
w.start()
File "/opt/homebrew/Cellar/python@3.10/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/opt/homebrew/Cellar/python@3.10/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/opt/homebrew/Cellar/python@3.10/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/context.py", line 288, in _Popen
return Popen(process_obj)
File "/opt/homebrew/Cellar/python@3.10/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/opt/homebrew/Cellar/python@3.10/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/opt/homebrew/Cellar/python@3.10/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/opt/homebrew/Cellar/python@3.10/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'get_image_dataset.<locals>.ImageDataset'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/opt/homebrew/Cellar/python@3.10/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/opt/homebrew/Cellar/python@3.10/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
File "/opt/homebrew/Cellar/python@3.10/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/synchronize.py", line 110, in __setstate__
self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory
Looks like the error is AttributeError: Can't pickle local object 'get_image_dataset.<locals>.ImageDataset' Traceback (most recent call last):
Something different about how macos handle things
On Sun, Jan 28, 2024, 1:55 PM Nguyen Hoang @.***> wrote:
I'm using macOS Sonoma. The image_folder was created using img2dataset, and contains: 00000 00000.parquet 00000_stats.json Trying absolute path gives same error, except the dataset size is different:
clip-retrieval inference --input_dataset image_folder --output_folder embeddings_folder --enable_text False The number of samples has been estimated to be 124 Starting the worker dataset is 12 Starting work on task 0 warming up with batch size 256 on cpu done warming up in 17.178229808807373s Traceback (most recent call last): File "/opt/homebrew/bin/clip-retrieval", line 8, in
sys.exit(main()) File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/cli.py", line 18, in main fire.Fire( File "/opt/homebrew/lib/python3.10/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/opt/homebrew/lib/python3.10/site-packages/fire/core.py", line 466, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/opt/homebrew/lib/python3.10/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace component = fn(varargs, kwargs) File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/clip_inference/main.py", line 154, in main distributor() File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/clip_inference/distributor.py", line 17, in call worker( File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/clip_inference/worker.py", line 125, in worker runner(task) File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/clip_inference/runner.py", line 39, in call batch = iterator.next() File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/clip_inference/reader.py", line 222, in iter for batch in self.dataloader: File "/opt/homebrew/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 435, in iter return self._get_iterator() File "/opt/homebrew/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 381, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "/opt/homebrew/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1034, in init w.start() File @./3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File @./3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/context.py", line 224, in _Popen return _default_context.get_context().Process._Popen(process_obj) File @./3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/context.py", line 288, in _Popen return Popen(process_obj) File @./3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in init super().init(process_obj) File @./3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_fork.py", line 19, in init self._launch(process_obj) File @./3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 47, in _launch reduction.dump(process_obj, fp) File @./3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) AttributeError: Can't pickle local object 'get_image_dataset. .ImageDataset' Traceback (most recent call last): File " ", line 1, in File @./3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File @./3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/spawn.py", line 126, in _main self = reduction.pickle.load(from_parent) File @.**/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/synchronize.py", line 110, in setstate self._semlock = _multiprocessing.SemLock._rebuild(state) FileNotFoundError: [Errno 2] No such file or directory With absolute path:
clip-retrieval inference --input_dataset /Users/hknguyen20/image_folder --output_folder embedding s_folder --enable_text False The number of samples has been estimated to be 124 Starting the worker dataset is 30 Starting work on task 0 warming up with batch size 256 on cpu done warming up in 16.550618886947632s Traceback (most recent call last): File "/opt/homebrew/bin/clip-retrieval", line 8, in
sys.exit(main()) File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/cli.py", line 18, in main fire.Fire( File "/opt/homebrew/lib/python3.10/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/opt/homebrew/lib/python3.10/site-packages/fire/core.py", line 466, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/opt/homebrew/lib/python3.10/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace component = fn(varargs, kwargs) File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/clip_inference/main.py", line 154, in main distributor() File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/clip_inference/distributor.py", line 17, in call worker( File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/clip_inference/worker.py", line 125, in worker runner(task) File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/clip_inference/runner.py", line 39, in call batch = iterator.next() File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/clip_inference/reader.py", line 222, in iter for batch in self.dataloader: File "/opt/homebrew/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 435, in iter return self._get_iterator() File "/opt/homebrew/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 381, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "/opt/homebrew/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1034, in init w.start() File @./3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File @./3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/context.py", line 224, in _Popen return _default_context.get_context().Process._Popen(process_obj) File @./3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/context.py", line 288, in _Popen return Popen(process_obj) File @./3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in init super().init(process_obj) File @./3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_fork.py", line 19, in init self._launch(process_obj) File @./3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 47, in _launch reduction.dump(process_obj, fp) File @./3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) AttributeError: Can't pickle local object 'get_image_dataset. .ImageDataset' Traceback (most recent call last): File " ", line 1, in File @./3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File @./3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/spawn.py", line 126, in _main self = reduction.pickle.load(from_parent) File @.**/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/synchronize.py", line 110, in setstate self._semlock = _multiprocessing.SemLock._rebuild(state) FileNotFoundError: [Errno 2] No such file or directory — Reply to this email directly, view it on GitHub https://github.com/rom1504/clip-retrieval/issues/345#issuecomment-1913587226, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437XTQN277SHSU3LOMKDYQZDDHAVCNFSM6AAAAABCFPG3OWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJTGU4DOMRSGY . You are receiving this because you commented.Message ID: @.***>
I see. It's same issue as #142
Can you try making https://github.com/rom1504/clip-retrieval/blob/ee0931f89c69cf2e39b5187d50a40873b7999d2b/clip_retrieval/clip_inference/reader.py#L59 top level and rerun ?
I see a similar error when trying to use this -
KeyError: '00000/000000000.txt'
Happens after I captions with the same name as a the image next to the image. How do the captions have to be placed?
@rohun-tripathi can you provide more information? Command, environment,...
This is not expected
I tried making top level, but when running end2end inference test I encountered AttributeError: Can't pickle local object 'create_webdataset_filter.<locals>
and could not resolve this. In the end, could solve the initial error without modifying code by stopping multiprocessing, as pointed out in #220
If you met the same errors,trying to refer to: https://github.com/rom1504/clip-retrieval/issues/352
~/clip-retriever master [!?] +93 -97 98% ............................................................................................................................via 🐍 v3.11.6 (env) ❯ clip-retrieval inference --input_dataset ./source_images/ --output_folder ./output_folder/ The number of samples has been estimated to be 22 Starting the worker dataset is 16 Starting work on task 0 warming up with batch size 256 on cuda done warming up in 24.880407333374023s Traceback (most recent call last): File "/home/bored/clip-retriever/env/bin/clip-retrieval", line 8, in
sys.exit(main())
^^^^^^
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/clip_retrieval/cli.py", line 18, in main
fire.Fire(
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/clip_retrieval/clip_inference/main.py", line 154, in main
distributor()
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/clip_retrieval/clip_inference/distributor.py", line 17, in call
worker(
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/clip_retrieval/clip_inference/worker.py", line 127, in worker
runner(task)
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/clip_retrieval/clip_inference/runner.py", line 39, in call
batch = iterator.next()
^^^^^^^^^^^^^^^^^^^
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/clip_retrieval/clip_inference/reader.py", line 225, in iter
for batch in self.dataloader:
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 630, in next
data = self._next_data()
^^^^^^^^^^^^^^^^^
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
return self._process_data(data)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
data.reraise()
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/torch/_utils.py", line 694, in reraise
raise exception
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
^^^^^^^^^^^^^^^^^^^^
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]