ngruver / NOS

Protein Design with Guided Discrete Diffusion
https://arxiv.org/abs/2305.20009
MIT License
116 stars 10 forks source link

ZeroDivisionError: division by zero #5

Closed ShayekhBinIslam closed 7 months ago

ShayekhBinIslam commented 9 months ago

While training and evaluation, we are getting ZeroDivisionError

    ...
    percentages = {aa: count / self.length for aa, count in aa_counts.items()}                                                         
ZeroDivisionError: division by zero
ngruver commented 9 months ago

It looks like you are training or evaluating on proteins of length 0. Could you provide more context on how you are using the code or the full stack trace? There is probably an issue with the sampling process or the data going into the model.

ShayekhBinIslam commented 9 months ago

Here is the full stack trace:

Error executing job with overrides: ['data_dir=/NOS/data', 'vocab_file=/NOS/vocab.txt', 'log_dir=/NOS/logs/guided_protein_seq', 'model=gaussian', 'target_cols=["ss_perc_sheet"]', 'model.noise_schedule.noise_scale=5', 'model.optimizer.lr=0.0002', 'model.network.discr_stop_grad=False', 'discr_batch_ratio=4', 'exp_name=gaussian_5_sheet_discr_joint']
Traceback (most recent call last):
  File "/NOS/scripts/train_seq_model.py", line 46, in main
    trainer.fit(
  File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit
    call._call_and_handle_interrupt(
  File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 43, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
  File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 102, in launch
    return function(*args, **kwargs)
  File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 989, in _run
    results = self._run_stage()
  File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1035, in _run_stage
    self.fit_loop.run()
  File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 202, in run
    self.advance()
  File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 359, in advance
    self.epoch_loop.run(self._data_fetcher)
  File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 137, in run
    self.on_advance_end(data_fetcher)
  File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 285, in on_advance_end
    self.val_loop.run()
  File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/pytorch_lightning/loops/utilities.py", line 182, in _decorator
    return loop_run(self, *args, **kwargs)
  File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 141, in run
    return self.on_run_end()
  File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 253, in on_run_end
    self._on_evaluation_epoch_end()
  File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 328, in _on_evaluation_epoch_end
    call._call_callback_hooks(trainer, hook_name)
  File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 208, in _call_callback_hooks
    fn(trainer, trainer.lightning_module, *args, **kwargs)
  File "/NOS/seq_models/trainer.py", line 100, in on_validation_epoch_end
    _, log = sample_model(
  File "/NOS/seq_models/sample.py", line 152, in sample_model
    seed_log, seed_wandb_log = metrics.evaluate_samples(
  File "/NOS/seq_models/metrics.py", line 215, in evaluate_samples
    samp_df = labeler.label_seqs(s_for_labels)
  File "/NOS/seq_models/metrics.py", line 98, in label_seqs
    return pd.DataFrame([self.label_seq(s) for s in seqs])
  File "/NOS/seq_models/metrics.py", line 98, in <listcomp>
    return pd.DataFrame([self.label_seq(s) for s in seqs])
  File "/NOS/seq_models/metrics.py", line 78, in label_seq
    ss_frac = X.secondary_structure_fraction()
  File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/Bio/SeqUtils/ProtParam.py", line 332, in secondary_structure_fraction
    aa_percentages = self.get_amino_acids_percent()
  File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/Bio/SeqUtils/ProtParam.py", line 116, in get_amino_acids_percent
    percentages = {aa: count / self.length for aa, count in aa_counts.items()}
  File "/home/ray/micromamba/envs/nos/lib/python3.10/site-packages/Bio/SeqUtils/ProtParam.py", line 116, in <dictcomp>
    percentages = {aa: count / self.length for aa, count in aa_counts.items()}
ZeroDivisionError: division by zero
ShayekhBinIslam commented 9 months ago

@ngruver The stack trace given above.

ngruver commented 8 months ago

I just committed a change that should hopefully prevent this error: https://github.com/ngruver/NOS/blob/main/seq_models/metrics.py#L211

Something appears to be going very wrong in your training loop such that you are sampling empty strings from the model though. You might want to print out the samples and see if they are all alignment tokens or something degenerate along those lines.