hushell / pmf_cvpr22

183 stars 24 forks source link

Some very tricky errors occurred when running test_bscdfsl.py, hoping for help. #34

Open aaawork opened 9 months ago

aaawork commented 9 months ago

G:\awh\pmf_cvpr22-main\engine.py:115: UserWarning: The structure of <datasets.get_bscd_loader.._Loader object at 0x000002117F54D2E0> is not recognizable. warnings.warn(f'The structure of {data_loaders} is not recognizable.') Traceback (most recent call last): File "test_bscdfsl.py", line 116, in main(args) File "test_bscdfsl.py", line 62, in main test_stats = evaluate(data_loader_val, model, criterion, device, seed=1234, ep=5) File "G:\awh\pmf_cvpr22-main\engine.py", line 116, in evaluate return _evaluate(data_loaders, model, criterion, device, seed) File "C:\Users\dhs.conda\envs\Hlbl\lib\site-packages\torch\autograd\grad_mode.py", line 28, in decorate_context return func(*args, **kwargs) File "G:\awh\pmf_cvpr22-main\engine.py", line 134, in _evaluate for ii, batch in enumerate(metric_logger.log_every(data_loader, 10, header)): File "G:\awh\pmf_cvpr22-main\utils\deit_util.py", line 141, in log_every for obj in iterable: File "G:\awh\pmf_cvpr22-main\datasets__init.py", line 178, in _loader_wrap for x, y in novel_loader: File "C:\Users\dhs.conda\envs\Hlbl\lib\site-packages\torch\utils\data\dataloader.py", line 359, in iter return self._get_iterator() File "C:\Users\dhs.conda\envs\Hlbl\lib\site-packages\torch\utils\data\dataloader.py", line 305, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "C:\Users\dhs.conda\envs\Hlbl\lib\site-packages\torch\utils\data\dataloader.py", line 918, in init w.start() File "C:\Users\dhs.conda\envs\Hlbl\lib\multiprocessing\process.py", line 121, in start self._popen = self._Popen(self) File "C:\Users\dhs.conda\envs\Hlbl\lib\multiprocessing\context.py", line 224, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "C:\Users\dhs.conda\envs\Hlbl\lib\multiprocessing\context.py", line 326, in _Popen return Popen(process_obj) File "C:\Users\dhs.conda\envs\Hlbl\lib\multiprocessing\popen_spawn_win32.py", line 93, in init__ reduction.dump(process_obj, to_child) File "C:\Users\dhs.conda\envs\Hlbl\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) _pickle.PicklingError: Can't pickle <function at 0x0000020E0E743040>: attribute lookup on datasets.cdfsl.CropDisease_few_shot failed Traceback (most recent call last): File "", line 1, in File "C:\Users\dhs.conda\envs\Hlbl\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "C:\Users\dhs.conda\envs\Hlbl\lib\multiprocessing\spawn.py", line 126, in _main self = reduction.pickle.load(from_parent) EOFError: Ran out of input

ZBC043 commented 5 months ago

G:\awh\pmf_cvpr22-main\engine.py:115: UserWarning: The structure of <datasets.get_bscd_loader.._Loader object at 0x000002117F54D2E0> is not recognizable. warnings.warn(f'The structure of {data_loaders} is not recognizable.') Traceback (most recent call last): File "test_bscdfsl.py", line 116, in main(args) File "test_bscdfsl.py", line 62, in main test_stats = evaluate(data_loader_val, model, criterion, device, seed=1234, ep=5) File "G:\awh\pmf_cvpr22-main\engine.py", line 116, in evaluate return _evaluate(data_loaders, model, criterion, device, seed) File "C:\Users\dhs.conda\envs\Hlbl\lib\site-packages\torch\autograd\grad_mode.py", line 28, in decorate_context return func(*args, kwargs) File "G:\awh\pmf_cvpr22-main\engine.py", line 134, in _evaluate for ii, batch in enumerate(metric_logger.log_every(data_loader, 10, header)): File "G:\awh\pmf_cvpr22-main\utils\deit_util.py", line 141, in log_every for obj in iterable: File "G:\awh\pmf_cvpr22-main\datasetsinit.py", line 178, in _loader_wrap for x, y in novel_loader: File "C:\Users\dhs.conda\envs\Hlbl\lib\site-packages\torch\utils\data\dataloader.py", line 359, in iter return self._get_iterator() File "C:\Users\dhs.conda\envs\Hlbl\lib\site-packages\torch\utils\data\dataloader.py", line 305, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "C:\Users\dhs.conda\envs\Hlbl\lib\site-packages\torch\utils\data\dataloader.py", line 918, in init w.start() File "C:\Users\dhs.conda\envs\Hlbl\lib\multiprocessing\process.py", line 121, in start self._popen = self._Popen(self) File "C:\Users\dhs.conda\envs\Hlbl\lib\multiprocessing\context.py", line 224, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "C:\Users\dhs.conda\envs\Hlbl\lib\multiprocessing\context.py", line 326, in _Popen return Popen(process_obj) File "C:\Users\dhs.conda\envs\Hlbl\lib\multiprocessing\popen_spawn_win32.py", line 93, in init** reduction.dump(process_obj, to_child) File "C:\Users\dhs.conda\envs\Hlbl\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) _pickle.PicklingError: Can't pickle <function at 0x0000020E0E743040>: attribute lookup on datasets.cdfsl.CropDisease_few_shot failed Traceback (most recent call last): File "", line 1, in File "C:\Users\dhs.conda\envs\Hlbl\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "C:\Users\dhs.conda\envs\Hlbl\lib\multiprocessing\spawn.py", line 126, in _main self = reduction.pickle.load(from_parent) EOFError: Ran out of input

Hi, have you solved this issue yet? I ran into the exact same issue as yours.

hushell commented 5 months ago

Hi guys, I guess it is probably due to the version of Python or Pytorch, that my code might violate some assertions in multiprocessing for example. Here is an answer from ChatGPT:

The error you're encountering involves a failure to pickle a function. This is a common issue in multiprocessing, especially when trying to share functions or objects that are not easily serializable.

Let's address the specific issues:

1. **Can't Pickle Function**:
   The error `Can't pickle <function at ...>: attribute lookup on datasets.cdfsl.CropDisease_few_shot failed` suggests that the function you're trying to pickle cannot be found or properly referenced. This often happens with nested functions, lambdas, or functions defined inside classes.

2. **EOFError**:
   This is a secondary error likely caused by the failure to pickle the function, resulting in incomplete data being sent to the subprocess.

### Solutions

#### 1. Ensure Functions are Top-Level
Ensure that the function you are trying to use with multiprocessing is defined at the top level of a module. Functions defined within other functions, methods within classes, or lambdas often cannot be pickled.

Example:
```python
# Incorrect
class MyClass:
    def method(self):
        def inner_function():
            pass

# Correct
def my_function():
    pass

2. Use multiprocessing.Manager or multiprocessing.Pool

Using multiprocessing.Manager or multiprocessing.Pool can help manage the processes more effectively. Managers provide a way to create shared objects that can be passed between processes.

3. Example with multiprocessing.Pool

Here's an example demonstrating the use of multiprocessing.Pool to avoid pickling issues with functions:

import multiprocessing

def crop_disease_few_shot(data):
    # Your processing function here
    return data * 2  # Example operation

if __name__ == '__main__':
    data = [1, 2, 3, 4, 5]

    with multiprocessing.Pool() as pool:
        results = pool.map(crop_disease_few_shot, data)

    print(results)

Example with a Top-Level Function

import multiprocessing

# Ensure the function is defined at the top level
def crop_disease_few_shot(data):
    # Process the data here
    return data * 2  # Example operation

def main():
    data = [1, 2, 3, 4, 5]

    with multiprocessing.Pool(processes=4) as pool:
        results = pool.map(crop_disease_few_shot, data)

    print(results)

if __name__ == '__main__':
    main()

Additional Tips

Example for Functions Inside a Class

If you need to use a method inside a class, make sure to use staticmethod or move the method outside the class.

import multiprocessing

class CropDisease:
    @staticmethod
    def few_shot(data):
        # Your processing function here
        return data * 2  # Example operation

def main():
    data = [1, 2, 3, 4, 5]

    with multiprocessing.Pool(processes=4) as pool:
        results = pool.map(CropDisease.few_shot, data)

    print(results)

if __name__ == '__main__':
    main()

By ensuring functions are at the top level, avoiding lambdas and nested functions, and using multiprocessing.Pool correctly, you should be able to avoid the EOFError and pickling issues. If you provide more details or code snippets, I can give more specific advice.

ZBC043 commented 5 months ago

Hi guys, I guess it is probably due to the version of Python or Pytorch, that my code might violate some assertions in multiprocessing for example. Here is an answer from ChatGPT:

The error you're encountering involves a failure to pickle a function. This is a common issue in multiprocessing, especially when trying to share functions or objects that are not easily serializable.

Let's address the specific issues:

1. **Can't Pickle Function**:
   The error `Can't pickle <function at ...>: attribute lookup on datasets.cdfsl.CropDisease_few_shot failed` suggests that the function you're trying to pickle cannot be found or properly referenced. This often happens with nested functions, lambdas, or functions defined inside classes.

2. **EOFError**:
   This is a secondary error likely caused by the failure to pickle the function, resulting in incomplete data being sent to the subprocess.

### Solutions

#### 1. Ensure Functions are Top-Level
Ensure that the function you are trying to use with multiprocessing is defined at the top level of a module. Functions defined within other functions, methods within classes, or lambdas often cannot be pickled.

Example:
```python
# Incorrect
class MyClass:
    def method(self):
        def inner_function():
            pass

# Correct
def my_function():
    pass

2. Use multiprocessing.Manager or multiprocessing.Pool

Using multiprocessing.Manager or multiprocessing.Pool can help manage the processes more effectively. Managers provide a way to create shared objects that can be passed between processes.

3. Example with multiprocessing.Pool

Here's an example demonstrating the use of multiprocessing.Pool to avoid pickling issues with functions:

import multiprocessing

def crop_disease_few_shot(data):
    # Your processing function here
    return data * 2  # Example operation

if __name__ == '__main__':
    data = [1, 2, 3, 4, 5]

    with multiprocessing.Pool() as pool:
        results = pool.map(crop_disease_few_shot, data)

    print(results)

Example with a Top-Level Function

import multiprocessing

# Ensure the function is defined at the top level
def crop_disease_few_shot(data):
    # Process the data here
    return data * 2  # Example operation

def main():
    data = [1, 2, 3, 4, 5]

    with multiprocessing.Pool(processes=4) as pool:
        results = pool.map(crop_disease_few_shot, data)

    print(results)

if __name__ == '__main__':
    main()

Additional Tips

  • Avoid Lambdas: Lambdas are not pickleable. Replace them with named functions.
  • Avoid Nested Functions: Move nested functions to the top level of the module.
  • Check Imports: Ensure all necessary imports are available in the module where the top-level function is defined.

Example for Functions Inside a Class

If you need to use a method inside a class, make sure to use staticmethod or move the method outside the class.

import multiprocessing

class CropDisease:
    @staticmethod
    def few_shot(data):
        # Your processing function here
        return data * 2  # Example operation

def main():
    data = [1, 2, 3, 4, 5]

    with multiprocessing.Pool(processes=4) as pool:
        results = pool.map(CropDisease.few_shot, data)

    print(results)

if __name__ == '__main__':
    main()

By ensuring functions are at the top level, avoiding lambdas and nested functions, and using multiprocessing.Pool correctly, you should be able to avoid the EOFError and pickling issues. If you provide more details or code snippets, I can give more specific advice.

Hi, thank you very much for your reply and many thanks for sharing this repo! I believe that I solved this issue by replacing the lambda function with a properly defined function. But I have another question: normally, how long will test_bscdfsl.py run on the ChestX dataset? I have used:

!python test_bscdfsl.py --test_n_way 5 --n_shot 5 --device cuda:0 --arch dino_small_patch16 --deploy finetune --output outputs/dino_small_cifar_1 --resume outputs/dino_small_cifar_1/best.pth --cdfsl_domains ChestX --ada_steps 100 --ada_lr 0.0001 --aug_prob 0.9 --aug_types color translation

Do you have any idea about roughly how long this will run on a RTX 4090? Many thanks.

hushell commented 5 months ago

It shouldn't be too long :) But I was running experiments on A40. I'll try to find my logs and let you know tomorrow.

ZBC043 commented 5 months ago

Thank you! It is now running for 2 hours and still keep running...