Closed ShayekhBinIslam closed 7 months ago
It looks like you are training or evaluating on proteins of length 0. Could you provide more context on how you are using the code or the full stack trace? There is probably an issue with the sampling process or the data going into the model.
Here is the full stack trace:
Error executing job with overrides: ['data_dir=/NOS/data', 'vocab_file=/NOS/vocab.txt', 'log_dir=/NOS/logs/guided_protein_seq', 'model=gaussian', 'target_cols=["ss_perc_sheet"]', 'model.noise_schedule.noise_scale=5', 'model.optimizer.lr=0.0002', 'model.network.discr_stop_grad=False', 'discr_batch_ratio=4', 'exp_name=gaussian_5_sheet_discr_joint']
Traceback (most recent call last):
File "/NOS/scripts/train_seq_model.py", line 46, in main
trainer.fit(
File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit
call._call_and_handle_interrupt(
File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 43, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 102, in launch
return function(*args, **kwargs)
File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 989, in _run
results = self._run_stage()
File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1035, in _run_stage
self.fit_loop.run()
File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 202, in run
self.advance()
File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 359, in advance
self.epoch_loop.run(self._data_fetcher)
File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 137, in run
self.on_advance_end(data_fetcher)
File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 285, in on_advance_end
self.val_loop.run()
File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/pytorch_lightning/loops/utilities.py", line 182, in _decorator
return loop_run(self, *args, **kwargs)
File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 141, in run
return self.on_run_end()
File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 253, in on_run_end
self._on_evaluation_epoch_end()
File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 328, in _on_evaluation_epoch_end
call._call_callback_hooks(trainer, hook_name)
File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 208, in _call_callback_hooks
fn(trainer, trainer.lightning_module, *args, **kwargs)
File "/NOS/seq_models/trainer.py", line 100, in on_validation_epoch_end
_, log = sample_model(
File "/NOS/seq_models/sample.py", line 152, in sample_model
seed_log, seed_wandb_log = metrics.evaluate_samples(
File "/NOS/seq_models/metrics.py", line 215, in evaluate_samples
samp_df = labeler.label_seqs(s_for_labels)
File "/NOS/seq_models/metrics.py", line 98, in label_seqs
return pd.DataFrame([self.label_seq(s) for s in seqs])
File "/NOS/seq_models/metrics.py", line 98, in <listcomp>
return pd.DataFrame([self.label_seq(s) for s in seqs])
File "/NOS/seq_models/metrics.py", line 78, in label_seq
ss_frac = X.secondary_structure_fraction()
File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/Bio/SeqUtils/ProtParam.py", line 332, in secondary_structure_fraction
aa_percentages = self.get_amino_acids_percent()
File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/Bio/SeqUtils/ProtParam.py", line 116, in get_amino_acids_percent
percentages = {aa: count / self.length for aa, count in aa_counts.items()}
File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/Bio/SeqUtils/ProtParam.py", line 116, in <dictcomp>
percentages = {aa: count / self.length for aa, count in aa_counts.items()}
ZeroDivisionError: division by zero
@ngruver The stack trace given above.
I just committed a change that should hopefully prevent this error: https://github.com/ngruver/NOS/blob/main/seq_models/metrics.py#L211
Something appears to be going very wrong in your training loop such that you are sampling empty strings from the model though. You might want to print out the samples and see if they are all alignment tokens or something degenerate along those lines.
While training and evaluation, we are getting
ZeroDivisionError