samgoldman97 / mist

Encoding MS/MS spectra using formula transformers for inferring molecular properties
MIT License
48 stars 14 forks source link

MIST fingerprint model on Mac OS #4

Closed Mjvolk3 closed 1 year ago

Mjvolk3 commented 1 year ago

I am having trouble running the MIST fingerprint model on mac os. I have been able to run everything up until the FFN binned model except the MIST fingerprint model. I rebuilt the environment from scratch due getting UnsatisfiableError: The following specifications were found to be incompatible with each other: ... after running conda env create -f environment.yml. I have attached conda list below if it helps. The error I am getting is TypeError: h5py objects cannot be pickled. I am suspicious it is an OS issue because I have done the same procedure on linux and python run_scripts/train_mist.py works.

(ms-gen) michaelvolk@M1-MV mist % mkdir results/model_train_demos                            8:26
python run_scripts/train_mist.py --cache-featurizers --dataset-name 'canopus_train_public' --fp-names morgan4096 --num-workers 10 --seed 1 --gpus 0 --split-file 'data/paired_spectra/canopus_train_public/splits/canopus_hplus_100_0.csv' --splitter-name 'preset' --augment-data --augment-prob 0.5 --batch-size 128 --inten-prob 0.1 --remove-prob 0.5 --remove-weights 'exp' --iterative-preds 'growing' --iterative-loss-weight 0.4 --learning-rate 0.00077 --weight-decay 1e-07 --max-epochs 600 --min-lr 0.0001 --lr-decay-time 10000 --lr-decay-frac 0.95 --hidden-size 256 --num-heads 8 --pairwise-featurization --peak-attn-layers 2 --refine-layers 4 --set-pooling 'cls' --spectra-dropout 0.1 --single-form-encoder --recycle-form-encoder --use-cls --cls-type 'ms1' --loss-fn 'cosine' --magma-aux-loss --frag-fps-loss-lambda 8 --magma-modulo 512 --patience 30 --save-dir 'mist_fp_model' --save-dir results/model_train_demos/mist_fp_model
mkdir: results/model_train_demos: File exists
Global seed set to 1
2023-04-13 08:40:49,196 INFO: add_forward_specs: false
additive_attn: false
augment_data: true
augment_prob: 0.5
batch_size: 128
cache_featurizers: true
ckpt_file: null
cls_type: ms1
dataset_name: canopus_train_public
debug: false
forward_aug_folder: null
fp_names:
- morgan4096
frac_orig: 0.4
frag_fps_loss_lambda: 8.0
gpus: 0
gradient_clip_val: 5
hidden_size: 256
inten_prob: 0.1
iterative_loss_weight: 0.4
iterative_preds: growing
learning_rate: 0.00077
loss_fn: cosine
lr_decay_frac: 0.95
lr_decay_time: 10000
magma_aux_loss: true
magma_modulo: 512
max_epochs: 600
max_peaks: null
min_epochs: null
min_lr: 0.0001
num_heads: 8
num_workers: 10
optim_name: radam
pairwise_featurization: true
patience: 30
peak_attn_layers: 2
persistent_workers: false
recycle_form_encoder: true
refine_layers: 4
remove_prob: 0.5
remove_weights: exp
reshuffle_val: false
save_dir: results/model_train_demos/mist_fp_model
scheduler: false
seed: 1
set_pooling: cls
shuffle_train: false
single_form_encoder: true
spectra_dropout: 0.1
split_file: data/paired_spectra/canopus_train_public/splits/canopus_hplus_100_0.csv
split_sizes:
- 0.8
- 0.1
- 0.1
splitter_name: preset
top_layers: 1
use_cls: true
weight_decay: 1.0e-07
worst_k_weight: null

^[[A2023-04-13 08:40:49,584 INFO: Loading paired specs
2023-04-13 08:40:49,921 INFO: Converting paired samples into Spectra objects
10709it [00:00, 154101.73it/s]
10709it [00:01, 9556.15it/s]
10709it [00:00, 5535034.08it/s]
2023-04-13 08:40:51,128 INFO: Done creating spectra objects
2023-04-13 08:40:51,225 INFO: Len of train: 6141
2023-04-13 08:40:51,225 INFO: Len of val: 1070
2023-04-13 08:40:51,225 INFO: Len of test: 819
2023-04-13 08:40:51,247 INFO: Created a temporary directory at /var/folders/t3/hcfdx0qs0rsd9bm4230xv_zc0000gn/T/tmp3ont5vbm
2023-04-13 08:40:51,247 INFO: Writing /var/folders/t3/hcfdx0qs0rsd9bm4230xv_zc0000gn/T/tmp3ont5vbm/_remote_module_non_scriptable.py
2023-04-13 08:40:51,307 INFO: Starting fold: Fold_100_0
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:611: UserWarning: Checkpoint directory /Users/michaelvolk/Documents/projects/mist/results/model_train_demos/mist_fp_model/Fold_100_0 exists and is not empty.
  rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.")

  | Name            | Type       | Params
-----------------------------------------------
0 | bce_loss        | BCELoss    | 0     
1 | spectra_encoder | ModuleList | 15.0 M
-----------------------------------------------
15.0 M    Trainable params
8.2 K     Non-trainable params
15.0 M    Total params
59.924    Total estimated model params size (MB)
Sanity Checking: 0it [00:00, ?it/s]Traceback (most recent call last):
  File "run_scripts/train_mist.py", line 8, in <module>
    train_mist.run_training()
  File "/Users/michaelvolk/Documents/projects/mist/src/mist/train_mist.py", line 78, in run_training
    test_loss = model.train_model(
  File "/Users/michaelvolk/Documents/projects/mist/src/mist/models/base.py", line 320, in train_model
    trainer.fit(self, module)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in fit
    self._call_and_handle_interrupt(
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 723, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1236, in _run
    results = self._run_stage()
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1323, in _run_stage
    return self._run_train()
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1345, in _run_train
    self._run_sanity_check()
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1413, in _run_sanity_check
    val_loop.run()
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 155, in advance
    dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 199, in run
    self.on_run_start(*args, **kwargs)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 88, in on_run_start
    self._data_fetcher = iter(data_fetcher)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/pytorch_lightning/utilities/fetching.py", line 178, in __iter__
    self.dataloader_iter = iter(self.dataloader)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 442, in __iter__
    return self._get_iterator()
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 388, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1043, in __init__
    w.start()
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/h5py/_hl/base.py", line 368, in __getnewargs__
    raise TypeError("h5py objects cannot be pickled")
TypeError: h5py objects cannot be pickled
samgoldman97 commented 1 year ago

Hey Michael,

First and foremost, thanks for trying to reproduce & use this! It's really nice to see.

As for the specific error, I'm a little bandwidth limited and won't be able to debug much further at the moment. It looks like it's a problem with parallelization across workers, so you may be able to fix this error by reducing num_workers to 0, at least for debugging to see if that helps? It will slow it down, but should at least run and provide clues for how to remove parallelism challenges. With the 10k dataset and not using forward augmentation, it shouldn't be terrible slow. Let me know if this works!

Sam

Mjvolk3 commented 1 year ago

Thanks Sam, this resolves the issues for now. Looks like there might be some more general issues with multiprocessing on Mac OS for python 3.8. From what I can tell the issue is related to how the data is collated in datasets.py.

samgoldman97 commented 1 year ago

Hi Michael, I'm marking this as resolved for now. I recently pushed an updated version of MIST, that I am hoping will address this and streamline the model training process. Thanks again for your patience and giving these methods a shot-- hope your work is going well!

Sam