openvax / mhcflurry

Peptide-MHC I binding affinity prediction
http://openvax.github.io/mhcflurry/
Apache License 2.0
193 stars 58 forks source link

mhcflurry-class1-train-allele-specific-models errors with allele arg #109

Closed julia326 closed 7 years ago

julia326 commented 7 years ago

Logs/stack trace here:

(mhcflurry) julia$ mhcflurry-class1-train-allele-specific-models \
--data '/Users/julia/Library/Application Support/mhcflurry/4/0.9.2/data_curated//curated_training_data.csv.bz2' \
--hyperparameters downloads-generation/models_class1/hyperparameters.json \
--out-models-dir models \
--min-measurements-per-allele 200 \
--allele 'HLA-B:35*01'
Using Theano backend.
Loaded hyperparameters list: [{u'dropout_probability': 0.0, u'use_embedding': False, u'n_models': 12, u'random_negative_rate': 0.0, u'layer_sizes': [32], u'random_negative_affinity_min': 20000.0, u'patience': 10, u'random_negative_affinity_max': 50000.0, u'validation_split': 0.2, u'activation': u'relu', u'kmer_size': 15, u'locally_connected_layers': [{u'activation': u'tanh', u'kernel_size': 3, u'filters': 8}, {u'activation': u'tanh', u'kernel_size': 3, u'filters': 8}], u'max_epochs': 500, u'dense_layer_l1_regularization': 0.001, u'output_activation': u'sigmoid', u'random_negative_constant': 25, u'batch_normalization': False, u'early_stopping': True}]
Loaded training data: (241552, 6)
Subselected to 8-15mers: (240505, 6)
Selected 1 alleles: HLA-B:35*01
Training data: (0, 6)
[ 1 /  1 hyperparameters] [ 1 / 12 replicates] [   1 /    1 alleles]: HLA-B:35*01
Traceback (most recent call last):
  File "/Users/julia/Envs/mhcflurry/bin/mhcflurry-class1-train-allele-specific-models", line 11, in <module>
    load_entry_point('mhcflurry', 'console_scripts', 'mhcflurry-class1-train-allele-specific-models')()
  File "/Users/julia/code/mhcflurry/mhcflurry/class1_affinity_prediction/train_allele_specific_models_command.py", line 113, in run
    frac=1.0)
  File "/Users/julia/Envs/mhcflurry/lib/python2.7/site-packages/pandas/core/generic.py", line 2900, in sample
    locs = rs.choice(axis_length, size=n, replace=replace, p=weights)
  File "mtrand.pyx", line 1115, in mtrand.RandomState.choice
ValueError: a must be greater than 0
julia326 commented 7 years ago

And a different error when using a different allele, looks like it made it further this time (is there something wrong for HLA-B:35*01?):

(mhcflurry) julia$ mhcflurry-class1-train-allele-specific-models \
> --data '/Users/julia/Library/Application Support/mhcflurry/4/0.9.2/data_curated//curated_training_data.csv.bz2' \
> --hyperparameters downloads-generation/models_class1/hyperparameters.json \
> --out-models-dir models \
> --min-measurements-per-allele 200 \
> --allele 'HLA-A*23:01'
Using Theano backend.
Loaded hyperparameters list: [{u'dropout_probability': 0.0, u'use_embedding': False, u'n_models': 12, u'random_negative_rate': 0.0, u'layer_sizes': [32], u'random_negative_affinity_min': 20000.0, u'patience': 10, u'random_negative_affinity_max': 50000.0, u'validation_split': 0.2, u'activation': u'relu', u'kmer_size': 15, u'locally_connected_layers': [{u'activation': u'tanh', u'kernel_size': 3, u'filters': 8}, {u'activation': u'tanh', u'kernel_size': 3, u'filters': 8}], u'max_epochs': 500, u'dense_layer_l1_regularization': 0.001, u'output_activation': u'sigmoid', u'random_negative_constant': 25, u'batch_normalization': False, u'early_stopping': True}]
Loaded training data: (241552, 6)
Subselected to 8-15mers: (240505, 6)
Selected 1 alleles: HLA-A*23:01
Training data: (2584, 6)
[ 1 /  1 hyperparameters] [ 1 / 12 replicates] [   1 /    1 alleles]: HLA-A*23:01
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.3026 - val_loss: 0.2149
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.1549 - val_loss: 0.1047
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0743 - val_loss: 0.0563
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0479 - val_loss: 0.0478
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0417 - val_loss: 0.0419
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0383 - val_loss: 0.0406
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0364 - val_loss: 0.0382
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0351 - val_loss: 0.0371
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0340 - val_loss: 0.0364
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0324 - val_loss: 0.0357
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0321 - val_loss: 0.0374
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0321 - val_loss: 0.0377
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0311 - val_loss: 0.0341
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0310 - val_loss: 0.0351
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0304 - val_loss: 0.0337
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0301 - val_loss: 0.0346
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0298 - val_loss: 0.0329
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0294 - val_loss: 0.0333
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0290 - val_loss: 0.0326
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0296 - val_loss: 0.0328
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0284 - val_loss: 0.0321
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0281 - val_loss: 0.0332
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0287 - val_loss: 0.0322
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0279 - val_loss: 0.0319
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0282 - val_loss: 0.0322
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0281 - val_loss: 0.0324
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0275 - val_loss: 0.0325
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0279 - val_loss: 0.0316
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0271 - val_loss: 0.0315
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0272 - val_loss: 0.0321
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0275 - val_loss: 0.0316
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0266 - val_loss: 0.0321
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0264 - val_loss: 0.0332
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0272 - val_loss: 0.0329
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0266 - val_loss: 0.0326
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0265 - val_loss: 0.0315
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0267 - val_loss: 0.0316
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0272 - val_loss: 0.0318
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0268 - val_loss: 0.0327
Train on 2227 samples, validate on 557 samples
Epoch 1/1
2227/2227 [==============================] - 0s - loss: 0.0266 - val_loss: 0.0316
Traceback (most recent call last):
  File "/Users/julia/Envs/mhcflurry/bin/mhcflurry-class1-train-allele-specific-models", line 11, in <module>
    load_entry_point('mhcflurry', 'console_scripts', 'mhcflurry-class1-train-allele-specific-models')()
  File "/Users/julia/code/mhcflurry/mhcflurry/class1_affinity_prediction/train_allele_specific_models_command.py", line 121, in run
    models_dir_for_save=args.out_models_dir)
  File "/Users/julia/code/mhcflurry/mhcflurry/class1_affinity_prediction/class1_affinity_predictor.py", line 336, in fit_allele_specific_predictors
    models_dir_for_save, model_names_to_write=[model_name])
  File "/Users/julia/code/mhcflurry/mhcflurry/class1_affinity_prediction/class1_affinity_predictor.py", line 158, in save
    row.model.get_weights(), weights_path)
  File "/Users/julia/code/mhcflurry/mhcflurry/class1_affinity_prediction/class1_affinity_predictor.py", line 641, in save_weights
    **dict((("array_%d" % i), w) for (i, w) in enumerate(weights_list)))
  File "/Users/julia/Envs/mhcflurry/lib/python2.7/site-packages/numpy/lib/npyio.py", line 593, in savez
    _savez(file, args, kwds, False)
  File "/Users/julia/Envs/mhcflurry/lib/python2.7/site-packages/numpy/lib/npyio.py", line 687, in _savez
    zipf = zipfile_factory(file, mode="w", compression=compression)
  File "/Users/julia/Envs/mhcflurry/lib/python2.7/site-packages/numpy/lib/npyio.py", line 101, in zipfile_factory
    return zipfile.ZipFile(file, *args, **kwargs)
  File "/usr/local/Cellar/python/2.7.12_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/zipfile.py", line 756, in __init__
    self.fp = open(file, modeDict[mode])
IOError: [Errno 2] No such file or directory: 'models/weights_HLA-A*23:01-0-bf6ae014e899048f.npz'
julia326 commented 7 years ago

Looks like these are the only supported alleles for training:

Selected 101 alleles: HLA-A*02:01 HLA-A*03:01 HLA-A*11:01 HLA-A*02:03 HLA-A*02:06 HLA-A*68:02 HLA-B*07:02 HLA-B*15:01 HLA-A*02:02 HLA-A*31:01 HLA-A*01:01 H-2-Kb HLA-A*24:02 H-2-Db HLA-A*26:01 HLA-B*08:01 HLA-B*58:01 HLA-B*27:05 HLA-A*68:01 HLA-B*40:01 HLA-B*35:01 HLA-A*33:01 HLA-A*30:01 HLA-A*69:01 HLA-B*51:01 HLA-B*57:01 Mamu-A*01:01 HLA-B*18:01 HLA-A*29:02 HLA-A*23:01 HLA-A*30:02 HLA-B*44:02 HLA-B*46:01 HLA-B*53:01 H-2-Kd HLA-B*39:01 Mamu-B*17:01 HLA-B*44:03 HLA-B*15:17 Mamu-B*17:04 HLA-A*24:03 Mamu-A*02:01 Mamu-A*11:01 HLA-A*02:19 HLA-B*54:01 HLA-A*02:12 HLA-A*80:01 Mamu-B*03:01 HLA-A*32:01 Mamu-B*08:01 HLA-B*45:01 HLA-A*02:11 HLA-B*40:02 HLA-B*08:02 HLA-A*25:01 Mamu-A*01:11 HLA-A*02:16 HLA-B*27:03 Mamu-B*52:01 Mamu-B*01:01 HLA-B*48:01 HLA-B*15:09 Patr-B*01:01 HLA-B*15:03 Mamu-A*22:01 Mamu-A*07:01 Eqca-1*01:01 Patr-A*09:01 HLA-A*26:02 H-2-Kk Mamu-A*01:02 Patr-A*07:01 H-2-Ld HLA-A*26:03 HLA-B*38:01 HLA-C*04:01 H-2-Dd HLA-B*08:03 HLA-C*07:01 Mamu-B*39:01 Patr-A*01:01 Mamu-B*83:01 HLA-C*06:02 Patr-A*03:01 HLA-B*15:42 HLA-B*45:06 HLA-A*02:17 HLA-B*83:01 HLA-C*03:04 Patr-A*04:01 Patr-B*24:01 HLA-C*14:02 HLA-B*35:03 HLA-B*27:01 Patr-B*13:01 HLA-B*14:02 HLA-C*05:01 HLA-B*42:01 HLA-B*15:02 Mamu-A*26:01 HLA-B*07:01

Where does this list come from?

julia326 commented 7 years ago

Cleared up with @timodonnell, turned out to be filtering based on --min-measurements-per-allele 200 - all good.