Open ljhOfGithub opened 11 months ago
我也遇到了同样的问题,请问您解决了吗 Package Version
absl-py 1.4.0 addict 2.2.1 aiohttp 3.8.5 aiosignal 1.3.1 albumentations 0.4.3 antlr4-python3-runtime 4.9.3 async-timeout 4.0.2 asynctest 0.13.0 attrs 23.1.0 cachetools 5.3.1 certifi 2023.7.22 charset-normalizer 3.2.0 cycler 0.11.0 einops 0.3.0 fonttools 4.38.0 frozenlist 1.3.3 fsspec 2023.1.0 future 0.18.3 google-auth 2.22.0 google-auth-oauthlib 0.4.6 grpcio 1.56.2 h5py 3.8.0 idna 3.4 imageio 2.31.1 imgaug 0.2.6 importlib-metadata 6.7.0 kiwisolver 1.4.4 Markdown 3.4.4 MarkupSafe 2.1.3 matplotlib 3.5.1 mkl-fft 1.3.1 mkl-random 1.2.2 mkl-service 2.4.0 multidict 6.0.4 networkx 2.6.3 numpy 1.20.3 nvidia-cublas-cu11 11.10.3.66 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cudnn-cu11 8.5.0.96 nystrom-attention 0.0.9 oauthlib 3.2.2 olefile 0.46 omegaconf 2.2.3 opencv-python 4.2.0.34 opencv-python-headless 4.2.0.34 packaging 23.1 pandas 1.2.3 Pillow 8.4.0 pip 23.2.1 protobuf 3.20.3 pyasn1 0.5.0 pyasn1-modules 0.3.0 pyparsing 3.1.0 python-dateutil 2.8.2 pytorch-lightning 1.2.3 pytorch-toolbelt 0.6.3 pytorchtools 0.0.2 pytz 2023.3 PyWavelets 1.3.0 PyYAML 6.0.1 requests 2.31.0 requests-oauthlib 1.3.1 rsa 4.9 scikit-image 0.19.3 scipy 1.7.3 setuptools 68.0.0 six 1.16.0 tensorboard 2.11.2 tensorboard-data-server 0.6.1 tensorboard-plugin-wit 1.8.1 tifffile 2021.11.2 torch 1.11.0+cu113 torchaudio 0.11.0+cu113 torchmetrics 0.6.2 torchvision 0.12.0+cu113 tqdm 4.65.0 typing_extensions 4.7.1 urllib3 1.26.16 Werkzeug 2.2.3 wheel 0.41.0 yarl 1.9.2 zipp 3.15.0
I solved this by "pip uninstall omegaconf".
uninstall omegaconf后出现如下问题,请问如何解决谢谢
Please ensure that you have correctly installed the specified version (0.0.9) of the 'nystrom_attention' library using the command "pip install nystrom_attention==0.0.9".
Afterwards, modify the line "q = self.scale" to "q = q self.scale" within the 'nystrom_attention.py' file that is located in your Conda environment. You can find this file in a directory resembling "/miniconda3/envs/transmil/lib/python'version'/site-packages/nystrom_attention/nystrom_attention.py".
To illustrate, if your Python version is 3.7 (as recommended by the authors), the path might be: "/miniconda3/envs/transmil/lib/python3.7/site-packages/nystrom_attention/nystrom_attention.py".
Please ensure that you have correctly installed the specified version (0.0.9) of the 'nystrom_attention' library using the command "pip install nystrom_attention==0.0.9".
Afterwards, modify the line "q = self.scale" to "q = q self.scale" within the 'nystrom_attention.py' file that is located in your Conda environment. You can find this file in a directory resembling "/miniconda3/envs/transmil/lib/python'version'/site-packages/nystrom_attention/nystrom_attention.py".
To illustrate, if your Python version is 3.7 (as recommended by the authors), the path might be: "/miniconda3/envs/transmil/lib/python3.7/site-packages/nystrom_attention/nystrom_attention.py".
Thanks! It works well.However,I have a new problem:
Traceback (most recent call last):
File "/home/jupyter-ljh/.local/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 651, in run_train
self.train_loop.run_training_epoch()
File "/home/jupyter-ljh/.local/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 578, in run_training_epoch
self.trainer.run_evaluation(on_epoch=True)
File "/home/jupyter-ljh/.local/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 755, in run_evaluation
deprecated_eval_results = self.evaluation_loop.evaluation_epoch_end()
File "/home/jupyter-ljh/.local/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 187, in evaluation_epoch_end
deprecated_results = self.__run_eval_epoch_end(self.num_dataloaders)
File "/home/jupyter-ljh/.local/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 225, in __run_eval_epoch_end
eval_results = model.validation_epoch_end(eval_results)
File "/mnt/data0/LI_jihao/mydata/TransMIL-main/models/model_interface.py", line 155, in validation_epoch_end
self.log('auc', self.AUROC(probs, target.squeeze()), prog_bar=True, on_epoch=True, logger=True)
File "/home/jupyter-ljh/.conda/envs/transmil/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/jupyter-ljh/.local/lib/python3.7/site-packages/torchmetrics/metric.py", line 234, in forward
self._forward_cache = self._forward_reduce_state_update(*args, **kwargs)
File "/home/jupyter-ljh/.local/lib/python3.7/site-packages/torchmetrics/metric.py", line 301, in _forward_reduce_state_update
batch_val = self.compute()
File "/home/jupyter-ljh/.local/lib/python3.7/site-packages/torchmetrics/metric.py", line 530, in wrapped_func
value = compute(*args, **kwargs)
File "/home/jupyter-ljh/.local/lib/python3.7/site-packages/torchmetrics/classification/auroc.py", line 112, in compute
return _binary_auroc_compute(state, self.thresholds, self.max_fpr)
File "/home/jupyter-ljh/.local/lib/python3.7/site-packages/torchmetrics/functional/classification/auroc.py", line 89, in _binary_auroc_compute
fpr, tpr, _ = _binary_roc_compute(state, thresholds, pos_label)
File "/home/jupyter-ljh/.local/lib/python3.7/site-packages/torchmetrics/functional/classification/roc.py", line 53, in _binary_roc_compute
fps, tps, thresholds = _binary_clf_curve(preds=state[0], target=state[1], pos_label=pos_label)
File "/home/jupyter-ljh/.local/lib/python3.7/site-packages/torchmetrics/functional/classification/precision_recall_curve.py", line 67, in _binary_clf_curve
distinct_value_indices = torch.where(preds[1:] - preds[:-1])[0]
RuntimeError: numel: integer multiplication overflow
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 100, in <module>
main(cfg)
File "train.py", line 78, in main
trainer.fit(model = model, datamodule = dm)
File "/home/jupyter-ljh/.local/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 520, in fit
self.dispatch()
File "/home/jupyter-ljh/.local/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 560, in dispatch
self.accelerator.start_training(self)
File "/home/jupyter-ljh/.local/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 74, in start_training
self.training_type_plugin.start_training(trainer)
File "/home/jupyter-ljh/.local/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 111, in start_training
self._results = trainer.run_train()
File "/home/jupyter-ljh/.local/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 683, in run_train
self.train_loop.on_train_end()
File "/home/jupyter-ljh/.local/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 138, in on_train_end
self.trainer.call_hook("on_train_end")
File "/home/jupyter-ljh/.local/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1121, in call_hook
output = accelerator_hook(*args, **kwargs)
File "/home/jupyter-ljh/.local/lib/python3.7/site-packages/pytorch_lightning/accelerators/gpu.py", line 32, in on_train_end
torch.cuda.empty_cache()
File "/home/jupyter-ljh/.conda/envs/transmil/lib/python3.7/site-packages/torch/cuda/memory.py", line 125, in empty_cache
torch._C._cuda_emptyCache()
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
My gpu is enough for calculating,so I think it's caused by the package settings.Could you please tell me how to solve the error?
@ljhOfGithub Hi, I solved this problem by correcting the version of some packages, hope this could help you.
addict==2.2.1 apex==0.9.10dev einops==0.3.0 h5py==2.10.0 numpy==1.20.0 nystrom_attention==0.0.9 opencv_python==4.2.0.34 pandas==1.2.3 pytorch_lightning==1.2.3 pytorch_toolbelt==0.4.0 PyYAML==6.0 scikit_learn==1.0.2 scipy==1.5.4 setuptools==58.0.4 tensorboard==2.7.0 timm==0.3.2 torch==1.7.1+cu110 torchmetrics==0.4.1 torchvision==0.8.2+cu110 tqdm==4.46.1
But remember not to install apex.
I also receieve error: ".... target = OmegaConf.create(target, flags=flags)
File "/home/anaconda3/envs/there_new_transmil/lib/python3.7/site-packages/omegaconf/omegaconf.py", line 179, in create
flags=flags,
File "/home/anaconda3/envs/there_new_transmil/lib/python3.7/site-packages/omegaconf/omegaconf.py", line 851, in _create_impl
flags=flags,
File "/home/anaconda3/envs/there_new_transmil/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 111, in init
format_and_raise(node=None, key=key, value=None, cause=ex, msg=str(ex))
File "/home/anaconda3/envs/there_new_transmil/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 109, in init
self._set_value(content, flags=flags)
File "/home/anaconda3/envs/there_new_transmil/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 647, in _set_value
raise e
File "/home/anaconda3/envs/there_new_transmil/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 644, in _set_value
self._set_value_impl(value, flags)
File "/home/anaconda3/envs/there_new_transmil/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 690, in _set_value_impl
self.setitem(k, v)
File "/home/anaconda3/envs/there_new_transmil/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 314, in setitem
self._format_and_raise(key=key, value=value, cause=e)
File "/home/anaconda3/envs/there_new_transmil/lib/python3.7/site-packages/omegaconf/base.py", line 237, in _format_and_raise
type_override=type_override,
File "/home/anaconda3/envs/there_new_transmil/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 308, in setitem
self.set_impl(key=key, value=value)
File "/home/anaconda3/envs/there_new_transmil/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 318, in __set_impl
self._set_item_impl(key, value)
File "/home/
anaconda3/envs/there_new_transmil/lib/python3.7/site-packages/omegaconf/basecontainer.py", line 618, in _set_item_impl
self._wrap_value_and_set(key, value, target_type_hint)
File "/home/
anaconda3/envs/there_new_transmil/lib/python3.7/site-packages/omegaconf/basecontainer.py", line 631, in _wrap_value_and_set
parent=self,
File "/home/
anaconda3/envs/there_new_transmil/lib/python3.7/site-packages/omegaconf/omegaconf.py", line 1095, in _maybe_wrap
key=key,
File "/home/
anaconda3/envs/there_new_transmil/lib/python3.7/site-packages/omegaconf/omegaconf.py", line 1019, in _node_wrap
element_type=element_type,
File "/home/
anaconda3/envs/there_new_transmil/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 111, in init
format_and_raise(node=None, key=key, value=None, cause=ex, msg=str(ex))
File "/home/
anaconda3/envs/there_new_transmil/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 86, in init
flags=flags,
File "
/home/jupyter-ljh/.local/lib/python3.9/site-packages/pytorch_lightning/utilities/distributed.py:50: UserWarning: Experiment logs directory logs/Camelyon/TransMIL/fold0 exists and is not empty. Previous log files in this directory will be deleted when the new ones are saved! warnings.warn(*args, **kwargs) /home/jupyter-ljh/.local/lib/python3.9/site-packages/pytorch_lightning/utilities/distributed.py:50: UserWarning: Could not log computational graph since the `model.example_input_array` attribute is not set or `input_array` was not given warnings.warn(*args, **kwargs) Traceback (most recent call last): File "/mnt/data0/LI_jihao/mydata/TransMIL-main/train.py", line 96, in <module> main(cfg) File "/mnt/data0/LI_jihao/mydata/TransMIL-main/train.py", line 74, in main if cfg.General.server == 'train': File "/home/jupyter-ljh/.local/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 475, in fit self.setup_trainer(model) File "/home/jupyter-ljh/.local/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 423, in setup_trainer self.logger.save() File "/home/jupyter-ljh/.local/lib/python3.9/site-packages/pytorch_lightning/loggers/base.py", line 388, in save logger.save() File "/home/jupyter-ljh/.local/lib/python3.9/site-packages/pytorch_lightning/utilities/distributed.py", line 40, in wrapped_fn return fn(*args, **kwargs) File "/home/jupyter-ljh/.local/lib/python3.9/site-packages/pytorch_lightning/loggers/csv_logs.py", line 197, in save self.experiment.save() File "/home/jupyter-ljh/.local/lib/python3.9/site-packages/pytorch_lightning/loggers/csv_logs.py", line 85, in save save_hparams_to_yaml(hparams_file, self.hparams) File "/home/jupyter-ljh/.local/lib/python3.9/site-packages/pytorch_lightning/core/saving.py", line 387, in save_hparams_to_yaml OmegaConf.save(hparams, fp) File "/home/jupyter-ljh/.local/lib/python3.9/site-packages/omegaconf/omegaconf.py", line 218, in save data = OmegaConf.to_yaml(config, resolve=resolve) File "/home/jupyter-ljh/.local/lib/python3.9/site-packages/omegaconf/omegaconf.py", line 746, in to_yaml cfg = _ensure_container(cfg) File "/home/jupyter-ljh/.local/lib/python3.9/site-packages/omegaconf/_utils.py", line 953, in _ensure_container target = OmegaConf.create(target, flags=flags) File "/home/jupyter-ljh/.local/lib/python3.9/site-packages/omegaconf/omegaconf.py", line 176, in create return OmegaConf._create_impl( File "/home/jupyter-ljh/.local/lib/python3.9/site-packages/omegaconf/omegaconf.py", line 846, in _create_impl return DictConfig( File "/home/jupyter-ljh/.local/lib/python3.9/site-packages/omegaconf/dictconfig.py", line 111, in __init__ format_and_raise(node=None, key=key, value=None, cause=ex, msg=str(ex)) File "/home/jupyter-ljh/.local/lib/python3.9/site-packages/omegaconf/dictconfig.py", line 109, in __init__ self._set_value(content, flags=flags) File "/home/jupyter-ljh/.local/lib/python3.9/site-packages/omegaconf/dictconfig.py", line 647, in _set_value raise e File "/home/jupyter-ljh/.local/lib/python3.9/site-packages/omegaconf/dictconfig.py", line 644, in _set_value self._set_value_impl(value, flags) File "/home/jupyter-ljh/.local/lib/python3.9/site-packages/omegaconf/dictconfig.py", line 690, in _set_value_impl self.__setitem__(k, v) File "/home/jupyter-ljh/.local/lib/python3.9/site-packages/omegaconf/dictconfig.py", line 314, in __setitem__ self._format_and_raise(key=key, value=value, cause=e) File "/home/jupyter-ljh/.local/lib/python3.9/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/home/jupyter-ljh/.local/lib/python3.9/site-packages/omegaconf/dictconfig.py", line 308, in __setitem__ self.__set_impl(key=key, value=value) File "/home/jupyter-ljh/.local/lib/python3.9/site-packages/omegaconf/dictconfig.py", line 318, in __set_impl self._set_item_impl(key, value) File "/home/jupyter-ljh/.local/lib/python3.9/site-packages/omegaconf/basecontainer.py", line 618, in _set_item_impl self._wrap_value_and_set(key, value, target_type_hint) File "/home/jupyter-ljh/.local/lib/python3.9/site-packages/omegaconf/basecontainer.py", line 626, in _wrap_value_and_set wrapped = _maybe_wrap( File "/home/jupyter-ljh/.local/lib/python3.9/site-packages/omegaconf/omegaconf.py", line 1090, in _maybe_wrap return _node_wrap( File "/home/jupyter-ljh/.local/lib/python3.9/site-packages/omegaconf/omegaconf.py", line 1012, in _node_wrap node = DictConfig( File "/home/jupyter-ljh/.local/lib/python3.9/site-packages/omegaconf/dictconfig.py", line 111, in __init__ format_and_raise(node=None, key=key, value=None, cause=ex, msg=str(ex)) File "/home/jupyter-ljh/.local/lib/python3.9/site-packages/omegaconf/dictconfig.py", line 79, in __init__ metadata=ContainerMetadata( File "<string>", line 12, in __init__ File "/home/jupyter-ljh/.local/lib/python3.9/site-packages/omegaconf/base.py", line 91, in __post_init__ assert self.key_type is Any or isinstance(self.key_type, type) AssertionError
您好,非常棒的工作!我分割patch之后提取了特征,将数据放在了~/data/mntdata/data0/LI_jihao/camelyon_clam/camelyon16/training_fea/pt_files_copy文件夹下面,使用了您的fold0.csv,并且修改了TransMIL.yaml文件,如下:
General: comment: seed: 2021 fp16: True amp_level: O2 precision: 16 multi_gpu_mode: dp gpus: [0] # gpus: [1] epochs: &epoch 200 grad_acc: 2 frozen_bn: False patience: 10 server: train log_path: logs/ Data: dataset_name: camel_data data_shuffle: False # data_dir: Camelyon16/pt_files/ data_dir: ~/data/mntdata/data0/LI_jihao/camelyon_clam/camelyon16/training_fea/pt_files_copy label_dir: dataset_csv/camelyon16/ # label_dir: ~/data/mntdata/data0/LI_jihao/camelyon_clam/camelyon16/ label_dir: ~/data/mydata/TransMIL-main/dataset_csv/camelyon16/ fold: 0 nfold: 4 train_dataloader: batch_size: 1 num_workers: 8 test_dataloader: batch_size: 1 num_workers: 8 Model: name: TransMIL n_classes: 2 Optimizer: opt: lookahead_radam lr: 0.0002 opt_eps: null opt_betas: null momentum: null weight_decay: 0.00001 Loss: base_loss: CrossEntropyLoss
但是我得到了一开始的错误,请问我的问题出在哪里,我该怎么样修改
Hi, may I ask how did you solve this error???
A conclusion of these solutions: 1、Delete omegaconf and modify "q = self.scale" to "q = q self.scale" in nystrom_attention.py(make sure nystrom_attention==0.0.9 ) Then comes a new bug. 2、Intasll timm==0.4.0 I installed timm of version 0.11 before so I met the "CUDA ERROR".After installing timm of version 0.4 ,I can successfully run train.py.
I don't know whether modifying "q = self.scale" to "q = q self.scale" will be useful for fixing this bug.The two sentences seem to be the same.But installing timm of version 0.4 is very important.Using ‘pip install timm’ will install a latest version of timm by defult and this will create some errors like" missing an argument 'task' " and " CUDA error ".
hello @1803170327 , i have the same problem. however, uninstalling omegaconf did not help as now i get error:
File "(...)\site-packages\hydra\_internal\instantiate\_instantiate2.py", line 9, in <module>
from omegaconf import OmegaConf, SCMode
ModuleNotFoundError: No module named 'omegaconf'
I managed to fix it. I created env once again from scratch running
conda create -n transmil python=3.7 -y
conda activate transmil
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch
In requirements.txt
I deleted omegaconf
and added pytorch-toolbelt
, torchmetrics
in these versions:
addict==2.2.1
albumentations==0.4.3
einops==0.3.0
matplotlib==3.5.1
numpy==1.20.3
nystrom-attention==0.0.9
opencv-python==4.2.0.34
opencv-python-headless==4.2.0.34
pandas==1.2.3
Pillow==8.4.0
pytorch-lightning==1.2.3
pytorch-toolbelt==0.4.0
torchmetrics==0.4.1
and ran pip install -r requirements.txt
Then as stated before I modified q *= self.scale
to q = q * self.scale
in nystrom_attention.py. Now its up and running.
您好,非常棒的工作!我分割patch之后提取了特征,将数据放在了~/data/mntdata/data0/LI_jihao/camelyon_clam/camelyon16/training_fea/pt_files_copy文件夹下面,使用了您的fold0.csv,并且修改了TransMIL.yaml文件,如下:
但是我得到了一开始的错误,请问我的问题出在哪里,我该怎么样修改