uzh-rpg / RVT

Implementation of "Recurrent Vision Transformers for Object Detection with Event Cameras". CVPR 2023
MIT License
315 stars 41 forks source link

AttributeError: Can't pickle local object 'partialclass.<locals>.NewCls' #54

Open adeelferozmirza opened 3 months ago

adeelferozmirza commented 3 months ago

Hey when i run it on windows , i got this error , how to solve it can u help please

F:\SSCAmea\RVT-master>python train.py model=rnndet dataset=gen1 dataset.path=F:\SSCAmea\gen1 wandb.project_name=RVT wandb.group_name=gen1 +experiment/gen1=base.yaml hardware.gpus=0 batch_size.train=8 batch_size.eval=8 hardware.num_workers.train=6 hardware.num_workers.eval=2 Using python-based detection evaluation Set MaxViTRNN backbone (height, width) to (256, 320) Set partition sizes: (8, 10) Set num_classes=2 for detection head ------ Configuration ------ reproduce: seed_everything: null deterministic_flag: false benchmark: false training: precision: 16 max_epochs: 10000 max_steps: 400000 learning_rate: 0.0002 weight_decay: 0 gradient_clip_val: 1.0 limit_train_batches: 1.0 lr_scheduler: use: true total_steps: ${..max_steps} pct_start: 0.005 div_factor: 20 final_div_factor: 10000 validation: limit_val_batches: 1.0 val_check_interval: null check_val_every_n_epoch: 1 batch_size: train: 8 eval: 8 hardware: num_workers: train: 6 eval: 2 gpus: 0 dist_backend: nccl logging: ckpt_every_n_epochs: 1 train: metrics: compute: false detection_metrics_every_n_steps: null log_model_every_n_steps: 5000 log_every_n_steps: 500 high_dim: enable: true every_n_steps: 5000 n_samples: 4 validation: high_dim: enable: true every_n_epochs: 1 n_samples: 8 wandb: wandb_runpath: null artifact_name: null artifact_local_file: null resume_only_weights: false group_name: gen1 project_name: RVT dataset: name: gen1 path: F:\SSCAmea\gen1 train: sampling: mixed random: weighted_sampling: false mixed: w_stream: 1 w_random: 1 eval: sampling: stream data_augmentation: random: prob_hflip: 0.5 rotate: prob: 0 min_angle_deg: 2 max_angle_deg: 6 zoom: prob: 0.8 zoom_in: weight: 8 factor: min: 1 max: 1.5 zoom_out: weight: 2 factor: min: 1 max: 1.2 stream: prob_hflip: 0.5 rotate: prob: 0 min_angle_deg: 2 max_angle_deg: 6 zoom: prob: 0.5 zoom_out: factor: min: 1 max: 1.2 ev_repr_name: stacked_histogram_dt=50_nbins=10 sequence_length: 21 resolution_hw:


Disabling PL seed everything because of unresolved issues with shuffling during training on streaming datasets new run: generating id ba4dy0ts wandb: Currently logged in as: adeelferozmirza1 (adeelferozmirza). Use wandb login --relogin to force relogin wandb: Tracking run with wandb version 0.17.3 wandb: Run data is saved locally in F:\SSCAmea\RVT-master\wandb\run-20240627_152506-ba4dy0ts wandb: Run wandb offline to turn off syncing. wandb: Syncing run golden-feather-1 wandb: View project at https://wandb.ai/adeelferozmirza/RVT wandb: View run at https://wandb.ai/adeelferozmirza/RVT/runs/ba4dy0ts wandb: logging graph, to disable use wandb.watch(log_graph=False) Using 16bit native Automatic Mixed Precision (AMP) Trainer already configured with model summary callbacks: [<class 'pytorch_lightning.callbacks.model_summary.ModelSummary'>]. Skipping setting a default ModelSummary callback. GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs Trainer(limit_train_batches=1.0) was configured so 100% of the batches per epoch will be used.. Trainer(limit_val_batches=1.0) was configured so 100% of the batches will be used.. [Train] Local batch size for: stream sampling: 4 random sampling: 4 [Train] Local num workers for: stream sampling: 3 random sampling: 3 creating rnd access train datasets: 1458it [00:03, 419.82it/s] creating streaming train datasets: 1458it [00:09, 160.31it/s] num_full_sequences=317 num_splits=1141 num_split_sequences=5492 creating streaming val datasets: 429it [00:01, 399.27it/s] num_full_sequences=429 num_splits=0 num_split_sequences=0 LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

| Name | Type | Params

0 | mdl | YoloXDetector | 18.5 M 1 | mdl.backbone | RNNDetector | 12.8 M 2 | mdl.fpn | YOLOPAFPN | 3.9 M 3 | mdl.yolox_head | YOLOXHead | 1.9 M

18.5 M Trainable params 0 Non-trainable params 18.5 M Total params 37.073 Total estimated model params size (MB) Sanity Checking: 0it [00:00, ?it/s]C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\trainer\connectors\data_connector.py:224: PossibleUserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 20 which is the number of cpus on this machine) in theDataLoader` init to improve performance. rank_zero_warn( Using python-based detection evaluation Using python-based detection evaluation Sanity Checking DataLoader 0: 0%| | 0/2 [00:00<?, ?it/s]C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\torch\functional.py:512: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\TensorShape.cpp:3588.) return _VF.meshgrid(tensors, kwargs) # type: ignore[attr-defined] Epoch 0: : 0it [00:00, ?it/s]Using python-based detection evaluation Using python-based detection evaluation Using python-based detection evaluation == Timing statistics == == Timing statistics == Error executing job with overrides: ['model=rnndet', 'dataset=gen1', 'dataset.path=F:\SSCAmea\gen1', 'wandb.project_name=RVT', 'wandb.group_name=gen1', '+experiment/gen1=base.yaml', 'hardware.gpus=0', 'batch_size.train=8', 'batch_size.eval=8', 'hardware.num_workers.train=6', 'hardware.num_workers.eval=2'] Traceback (most recent call last): File "F:\SSCAmea\RVT-master\train.py", line 138, in main trainer.fit(model=module, ckpt_path=ckpt_path, datamodule=data_module) File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 603, in fit call._call_and_handle_interrupt( File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\trainer\call.py", line 38, in _call_and_handle_interrupt return trainer_fn(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 645, in _fit_impl self._run(model, ckpt_path=self.ckpt_path) File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1098, in _run results = self._run_stage() ^^^^^^^^^^^^^^^^^ File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1177, in _run_stage self._run_train() File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1200, in _run_train self.fit_loop.run() File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run self.advance(args, kwargs) File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\loops\fit_loop.py", line 267, in advance self._outputs = self.epoch_loop.run(self._data_fetcher) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\loops\loop.py", line 194, in run self.on_run_start(*args, kwargs) File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\loops\epoch\training_epoch_loop.py", line 161, in on_runstart = iter(data_fetcher) # creates the iterator inside the fetcher ^^^^^^^^^^^^^^^^^^ File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\utilities\fetching.py", line 179, in iter self._apply_patch() File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\utilities\fetching.py", line 120, in _apply_patch apply_to_collections(self.loaders, self.loader_iters, (Iterator, DataLoader), _apply_patch_fn) ^^^^^^^^^^^^^^^^^ File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\utilities\fetching.py", line 156, in loader_iters return self.dataloader_iter.loader_iters ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\trainer\supporters.py", line 555, in loader_iters self._loader_iters = self.create_loader_iters(self.loaders) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\trainer\supporters.py", line 595, in create_loader_iters return apply_to_collection(loaders, Iterable, iter, wrong_dtype=(Sequence, Mapping)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\lightning_utilities\core\apply_func.py", line 52, in apply_to_collection return _apply_to_collection_slow( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\lightning_utilities\core\apply_func.py", line 104, in _apply_to_collection_slow v = _apply_to_collection_slow( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\lightning_utilities\core\apply_func.py", line 96, in _apply_to_collection_slow return function(data, *args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\trainer\supporters.py", line 177, in iter self._loader_iter = iter(self.loader) ^^^^^^^^^^^^^^^^^ File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\torch\utils\data\dataloader.py", line 439, in iter return self._get_iterator() ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\torch\utils\data\dataloader.py", line 387, in _get_iterator return _MultiProcessingDataLoaderIter(self) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\torch\utils\data\dataloader.py", line 1040, in init w.start() File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\multiprocessing\process.py", line 121, in start self._popen = self._Popen(self) ^^^^^^^^^^^^^^^^^ File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\multiprocessing\context.py", line 224, in _Popen return _default_context.get_context().Process._Popen(process_obj) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\multiprocessing\context.py", line 336, in _Popen return Popen(process_obj) ^^^^^^^^^^^^^^^^^^ File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\multiprocessing\popen_spawn_win32.py", line 95, in init reduction.dump(process_obj, to_child) File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\torch\utils\data\datapipes\datapipe.py", line 172, in reduce_ex return super().reduce_ex(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\torch\utils\data\datapipes\datapipe.py", line 347, in getstate value = pickle.dumps(self._datapipe) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: Can't pickle local object 'partialclass..NewCls'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace. Using python-based detection evaluation Traceback (most recent call last): File "", line 1, in File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\multiprocessing\spawn.py", line 122, in spawn_main exitcode = _main(fd, parent_sentinel) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\multiprocessing\spawn.py", line 132, in _main self = reduction.pickle.load(from_parent) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ EOFError: Ran out of input == Timing statistics == wandb: View run golden-feather-1 at: https://wandb.ai/adeelferozmirza/RVT/runs/ba4dy0ts wandb: View project at: https://wandb.ai/adeelferozmirza/RVT wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 1 other file(s) wandb: Find logs at: .\wandb\run-20240627_152506-ba4dy0ts\logs wandb: WARNING The new W&B backend becomes opt-out in version 0.18.0; try it out with wandb.require("core")! See https://wandb.me/wandb-core for more information. == Timing statistics == Epoch 0: : 0it [00:28, ?it/s]

magehrig commented 3 months ago

This appears to be a limitation of pickle and windows. Specifically, here we dynamically define/overwrite the init function inside a class and pickle (in windows) does not like that. So I think the following should work instead (replace the above linked function with the following one):

def partialclass(cls, *args, **kwargs):
    class NewCls(cls):
        def __init__(self, *more_args, **more_kwargs):
            full_args = args + more_args
            full_kwargs = {**kwargs, **more_kwargs}
            super().__init__(*full_args, **full_kwargs)

    return NewCls

I have not tested it but this should work.

This should also resolve https://github.com/uzh-rpg/RVT/issues/45

ACC-Tony commented 2 months ago

I've also encountered this problem when running this project via PyCharm on Windows.

I've already modified the code in 'data/genx_utils/dataset_streaming.py': image However, the problem remains.

Is it because of pickle cannot work with lexical closures? Is this project working well with Linux? I may move on in linux.

The command I use is:

python .\train.py model=rnndet dataset=gen1 dataset.path=V:/gen1/ wandb.project_name=RVT wandb.group_name=gen1 +experiment/gen1="tiny.yaml" hardware.gpus=[0] batch_size.train=16 batch_size.eval=16 hardware.num_workers.train=6 hardware.num_workers.eval=6

And the log in the console is:

Using cpp-based detection evaluation
Set MaxViTRNN backbone (height, width) to (256, 320)
Set partition sizes: (8, 10)
Set num_classes=2 for detection head
------ Configuration ------
reproduce:
  seed_everything: null
  deterministic_flag: false
  benchmark: false
training:
  precision: 16
  max_epochs: 10000
  max_steps: 400000
  learning_rate: 0.0002
  weight_decay: 0
  gradient_clip_val: 1.0
  limit_train_batches: 1.0
  lr_scheduler:
    use: true
    total_steps: ${..max_steps}
    pct_start: 0.005
    div_factor: 20
    final_div_factor: 10000
validation:
  limit_val_batches: 1.0
  val_check_interval: null
  check_val_every_n_epoch: 1
batch_size:
  train: 16
  eval: 16
hardware:
  num_workers:
    train: 6
    eval: 6
  gpus:
  - 0
  dist_backend: nccl
logging:
  ckpt_every_n_epochs: 1
  train:
    metrics:
      compute: false
      detection_metrics_every_n_steps: null
    log_model_every_n_steps: 5000
    log_every_n_steps: 500
    high_dim:
      enable: true
      every_n_steps: 5000
      n_samples: 4
  validation:
    high_dim:
      enable: true
      every_n_epochs: 1
      n_samples: 8
wandb:
  wandb_runpath: null
  artifact_name: null
  artifact_local_file: null
  resume_only_weights: false
  group_name: gen1
  project_name: RVT
dataset:
  name: gen1
  path: V:/gen1/
  train:
    sampling: mixed
    random:
      weighted_sampling: false
    mixed:
      w_stream: 1
      w_random: 1
  eval:
    sampling: stream
  data_augmentation:
    random:
      prob_hflip: 0.5
      rotate:
        prob: 0
        min_angle_deg: 2
        max_angle_deg: 6
      zoom:
        prob: 0.8
        zoom_in:
          weight: 8
          factor:
            min: 1
            max: 1.5
        zoom_out:
          weight: 2
          factor:
            min: 1
            max: 1.2
    stream:
      prob_hflip: 0.5
      rotate:
        prob: 0
        min_angle_deg: 2
        max_angle_deg: 6
      zoom:
        prob: 0.5
        zoom_out:
          factor:
            min: 1
            max: 1.2
  ev_repr_name: stacked_histogram_dt=50_nbins=10
  sequence_length: 21
  resolution_hw:
  - 240
  - 304
  downsample_by_factor_2: false
  only_load_end_labels: false
model:
  name: rnndet
  backbone:
    name: MaxViTRNN
    compile:
      enable: false
      args:
        mode: reduce-overhead
    input_channels: 20
    enable_masking: false
    partition_split_32: 1
    embed_dim: 32
    dim_multiplier:
    - 1
    - 2
    - 4
    - 8
    num_blocks:
    - 1
    - 1
    - 1
    - 1
    T_max_chrono_init:
    - 4
    - 8
    - 16
    - 32
    stem:
      patch_size: 4
    stage:
      downsample:
        type: patch
        overlap: true
        norm_affine: true
      attention:
        use_torch_mha: false
        partition_size:
        - 8
        - 10
        dim_head: 32
        attention_bias: true
        mlp_activation: gelu
        mlp_gated: false
        mlp_bias: true
        mlp_ratio: 4
        drop_mlp: 0
        drop_path: 0
        ls_init_value: 1.0e-05
      lstm:
        dws_conv: false
        dws_conv_only_hidden: true
        dws_conv_kernel_size: 3
        drop_cell_update: 0
    in_res_hw:
    - 256
    - 320
  fpn:
    name: PAFPN
    compile:
      enable: false
      args:
        mode: reduce-overhead
    depth: 0.33
    in_stages:
    - 2
    - 3
    - 4
    depthwise: false
    act: silu
  head:
    name: YoloX
    compile:
      enable: false
      args:
        mode: reduce-overhead
    depthwise: false
    act: silu
    num_classes: 2
  postprocess:
    confidence_threshold: 0.1
    nms_threshold: 0.45

---------------------------
Disabling PL seed everything because of unresolved issues with shuffling during training on streaming datasets
new run: generating id 56gjza94
wandb: Currently logged in as: ***(***). Use `wandb login --relogin` to force relogin
wandb: wandb version 0.17.5 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.14.0
wandb: Run data is saved locally in D:\RVT-master\wandb\run-20240806_104650-56gjza94
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run gallant-elevator-26
wandb:  View project at https://wandb.ai/***/RVT
wandb:  View run at https://wandb.ai/***/RVT/runs/56gjza94
wandb: logging graph, to disable use `wandb.watch(log_graph=False)`
Using 16bit native Automatic Mixed Precision (AMP)
Trainer already configured with model summary callbacks: [<class 'pytorch_lightning.callbacks.model_summary.ModelSummary'>]. Skipping setting a default `ModelSummary` callback.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(limit_train_batches=1.0)` was configured so 100% of the batches per epoch will be used..
`Trainer(limit_val_batches=1.0)` was configured so 100% of the batches will be used..
[Train] Local batch size for:
stream sampling:        8
random sampling:        8
[Train] Local num workers for:
stream sampling:        3
random sampling:        3
creating rnd access train datasets: 1458it [00:42, 34.05it/s]
creating streaming train datasets: 1458it [02:16, 10.66it/s]
num_full_sequences=317
num_splits=1141
num_split_sequences=5492
creating streaming val datasets: 429it [00:18, 23.08it/s]
num_full_sequences=429
num_splits=0
num_split_sequences=0
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name           | Type          | Params
-------------------------------------------------
0 | mdl            | YoloXDetector | 4.4 M
1 | mdl.backbone   | RNNDetector   | 3.2 M
2 | mdl.fpn        | YOLOPAFPN     | 710 K
3 | mdl.yolox_head | YOLOXHead     | 474 K
-------------------------------------------------
4.4 M     Trainable params
0         Non-trainable params
4.4 M     Total params
8.810     Total estimated model params size (MB)
Sanity Checking: 0it [00:00, ?it/s]Using cpp-based detection evaluation
Using cpp-based detection evaluation
Using cpp-based detection evaluation
Using cpp-based detection evaluation
Using cpp-based detection evaluation
Using cpp-based detection evaluation
Sanity Checking DataLoader 0:   0%|                                                                                                                                                                          | 0/2 [00:00<?, ?it/s]C
:\Users\***\.conda\envs\rvt\lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\TensorShape.cpp:3484.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Epoch 0: : 0it [00:00, ?it/s]Using cpp-based detection evaluation                                                                                                                                                                  
Using cpp-based detection evaluation
Using cpp-based detection evaluation
== Timing statistics ==
== Timing statistics ==
== Timing statistics ==
== Timing statistics ==
== Timing statistics ==
== Timing statistics ==
Error executing job with overrides: ['model=rnndet', 'dataset=gen1', 'dataset.path=V:/gen1/', 'wandb.project_name=RVT', 'wandb.group_name=gen1', '+experiment/gen1=tiny.yaml', 'hardware.gpus=[0]', 'batch_size.train=16', 'batch_size.eval=16', 'hardware.num_workers.train=6', 'hardware.num_workers.eval=6']
Traceback (most recent call last):
  File "D:\RVT-master\train.py", line 138, in main
    trainer.fit(model=module, ckpt_path=ckpt_path, datamodule=data_module)
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 603, in fit
    call._call_and_handle_interrupt(
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\trainer\call.py", line 38, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 645, in _fit_impl
    self._run(model, ckpt_path=self.ckpt_path)
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1098, in _run
    results = self._run_stage()
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1177, in _run_stage
    self._run_train()
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1200, in _run_train
    self.fit_loop.run()
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\loops\fit_loop.py", line 267, in advance
    self._outputs = self.epoch_loop.run(self._data_fetcher)
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\loops\loop.py", line 194, in run
    self.on_run_start(*args, **kwargs)
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\loops\epoch\training_epoch_loop.py", line 161, in on_run_start
    _ = iter(data_fetcher)  # creates the iterator inside the fetcher
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\utilities\fetching.py", line 179, in __iter__
    self._apply_patch()
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\utilities\fetching.py", line 120, in _apply_patch
    apply_to_collections(self.loaders, self.loader_iters, (Iterator, DataLoader), _apply_patch_fn)
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\utilities\fetching.py", line 156, in loader_iters
    return self.dataloader_iter.loader_iters
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\trainer\supporters.py", line 555, in loader_iters
    self._loader_iters = self.create_loader_iters(self.loaders)
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\trainer\supporters.py", line 595, in create_loader_iters
    return apply_to_collection(loaders, Iterable, iter, wrong_dtype=(Sequence, Mapping))
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\lightning_utilities\core\apply_func.py", line 52, in apply_to_collection
    return _apply_to_collection_slow(
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\lightning_utilities\core\apply_func.py", line 104, in _apply_to_collection_slow
    v = _apply_to_collection_slow(
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\lightning_utilities\core\apply_func.py", line 96, in _apply_to_collection_slow
    return function(data, *args, **kwargs)
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\trainer\supporters.py", line 177, in __iter__
    self._loader_iter = iter(self.loader)
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\torch\utils\data\dataloader.py", line 442, in __iter__
    return self._get_iterator()
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\torch\utils\data\dataloader.py", line 388, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\torch\utils\data\dataloader.py", line 1043, in __init__
    w.start()
  File "C:\Users\***\.conda\envs\rvt\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "C:\Users\***\.conda\envs\rvt\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\***\.conda\envs\rvt\lib\multiprocessing\context.py", line 327, in _Popen
    return Popen(process_obj)
  File "C:\Users\***\.conda\envs\rvt\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Users\***\.conda\envs\rvt\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\torch\utils\data\datapipes\datapipe.py", line 167, in __reduce_ex__
    return super().__reduce_ex__(*args, **kwargs)
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\torch\utils\data\datapipes\datapipe.py", line 333, in __getstate__
    value = pickle.dumps(self._datapipe)
AttributeError: Can't pickle local object 'build_streaming_train_dataset.<locals>.partialclass'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
wandb: Waiting for W&B process to finish... (failed 1). Press Ctrl-C to abort syncing.
Using cpp-based detection evaluation
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\***\.conda\envs\rvt\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\***\.conda\envs\rvt\lib\multiprocessing\spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
== Timing statistics ==
wandb:  View run gallant-elevator-26 at: https://wandb.ai/***/RVT/runs/56gjza94
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 1 other file(s)
wandb: Find logs at: .\wandb\run-20240806_104650-56gjza94\logs
== Timing statistics ==
Epoch 0: : 0it [00:26, ?it/s]
magehrig commented 1 month ago

Unfortunately, I can't test this because I don't have a windows machine. Yes, the project was written and tested in Linux (specifically Ubuntu) so moving to Linux is one solution.