Hi, I am running the CDVAE carbon experiment and I have been seeing a weird error. It appears that my code will just hang after completely three iterations of the first epoch.
I run **python cdvae/run.py data=carbon expname=carbon model.predict_property=True**
The output I see is this:
`[2023-07-13 16:57:36,190][hydra.utils][INFO] - Instantiating <cdvae.pl_data.datamodule.CrystDataModule>
[2023-07-13 16:57:37,161][numexpr.utils][INFO] - Note: detected 128 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
[2023-07-13 16:57:37,161][numexpr.utils][INFO] - Note: NumExpr detected 128 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
25%|█████████████████████▍ | 1521/6091 [00:25<01:29, 50.81it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
34%|█████████████████████████████▎ | 2080/6091 [00:34<01:05, 61.51it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
46%|███████████████████████████████████████▊ | 2820/6091 [00:46<01:02, 52.70it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
50%|██████████████████████████████████████████▋ | 3021/6091 [00:49<00:52, 58.26it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
51%|███████████████████████████████████████████▍ | 3079/6091 [00:50<00:54, 55.41it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
51%|████████████████████████████████████████████▏ | 3132/6091 [00:51<00:40, 72.78it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
52%|████████████████████████████████████████████▎ | 3140/6091 [00:51<00:49, 59.77it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
60%|███████████████████████████████████████████████████▊ | 3673/6091 [00:59<00:38, 63.39it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
66%|████████████████████████████████████████████████████████▋ | 4018/6091 [01:05<00:32, 63.75it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
67%|█████████████████████████████████████████████████████████▌ | 4077/6091 [01:06<00:33, 60.74it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
67%|█████████████████████████████████████████████████████████▊ | 4098/6091 [01:06<00:29, 67.92it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
69%|███████████████████████████████████████████████████████████▊ | 4233/6091 [01:08<00:29, 63.53it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
74%|████████████████████████████████████████████████████████████████ | 4536/6091 [01:13<00:23, 67.50it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
80%|████████████████████████████████████████████████████████████████████▋ | 4869/6091 [01:18<00:16, 72.17it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
84%|████████████████████████████████████████████████████████████████████████ | 5106/6091 [01:22<00:18, 53.65it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
91%|██████████████████████████████████████████████████████████████████████████████▌ | 5566/6091 [01:29<00:08, 63.96it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
95%|█████████████████████████████████████████████████████████████████████████████████▋ | 5786/6091 [01:33<00:05, 59.95it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
96%|██████████████████████████████████████████████████████████████████████████████████▏ | 5822/6091 [01:33<00:04, 64.72it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
98%|████████████████████████████████████████████████████████████████████████████████████▎ | 5974/6091 [01:36<00:01, 66.91it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
100%|██████████████████████████████████████████████████████████████████████████████████████| 6091/6091 [01:39<00:00, 61.48it/s]
/gpfs/fs1/home/cdvae-old/cdvae/cdvae/common/data_utils.py:644: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at /data/miniconda3/envs/opence-1.7/conda-bld/pytorch-base_1663986328871/work/torch/csrc/utils/tensor_new.cpp:201.)
targets = torch.tensor([d[key] for d in data_list])
/gpfs/fs1/home/cdvae-old/cdvae/cdvae/common/data_utils.py:612: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
X = torch.tensor(X, dtype=torch.float)
[2023-07-13 16:59:20,540][hydra.utils][INFO] - Instantiating <cdvae.pl_modules.model.CDVAE>
[2023-07-13 16:59:20,615][torch.distributed.nn.jit.instantiator][INFO] - Created a temporary directory at /tmp/tmpwv1glt9u
[2023-07-13 16:59:20,615][torch.distributed.nn.jit.instantiator][INFO] - Writing /tmp/tmpwv1glt9u/_remote_module_non_scriptable.py
[2023-07-13 16:59:53,346][hydra.utils][INFO] - Passing scaler from datamodule to model <StandardScalerTorch(means: -154.2510223388672, stds: 0.13738815486431122)>
[2023-07-13 16:59:53,348][hydra.utils][INFO] - Adding callback <LearningRateMonitor>
[2023-07-13 16:59:53,349][hydra.utils][INFO] - Adding callback <EarlyStopping>
[2023-07-13 16:59:53,350][hydra.utils][INFO] - Adding callback <ModelCheckpoint>
[2023-07-13 16:59:53,354][hydra.utils][INFO] - Instantiating <WandbLogger>
wandb: Currently logged in as: _. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.15.5
wandb: Run data is saved locally in /home/cdvae-old/cdvae/wabdb/wandb/run-20230713_165954-u04zv43g
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run carbon
wandb: ⭐️ View project at https://wandb.ai/_/crystal_generation_mit
wandb: 🚀 View run at https://wandb.ai/_/crystal_generation_mit/runs/u04zv43g
[2023-07-13 17:00:07,550][hydra.utils][INFO] - W&B is now watching <{cfg.logging.wandb_watch.log}>!
wandb: logging graph, to disable use `wandb.watch(log_graph=False)`
[2023-07-13 17:00:07,588][hydra.utils][INFO] - Instantiating the Trainer
/home/.conda/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py:96: LightningDeprecationWarning: Setting `Trainer(progress_bar_refresh_rate=20)` is deprecated in v1.5 and will be removedin v1.7. Please pass `pytorch_lightning.callbacks.progress.TQDMProgressBar` with `refresh_rate` directly to the Trainer's `callbacks` argument instead. Or, to disable the progress bar pass `enable_progress_bar = False` to the Trainer.
rank_zero_deprecation(
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
[2023-07-13 17:00:07,650][hydra.utils][INFO] - Starting training!
0%| | 2/6091 [00:00<32:19, 3.14it/s
I am running on MIST HPC, so I have turned off WandB logging.
Hi, I am running the CDVAE carbon experiment and I have been seeing a weird error. It appears that my code will just hang after completely three iterations of the first epoch.
I run
**python cdvae/run.py data=carbon expname=carbon model.predict_property=True**
The output I see is this:
I am running on MIST HPC, so I have turned off WandB logging.
Environment
Any suggestions on how to resolve this? I am not very familiar with Hydra and Pytorch Lightning.