janclemenslab / das

Deep Audio Segmenter
http://janclemenslab.org/das/
Apache License 2.0
28 stars 10 forks source link

Optimizing model parameters and training outputs #42

Closed chitayata closed 2 years ago

chitayata commented 2 years ago

Dear Jan and team,

I am using das through Python and would like to fit the network with different configurations to compare performance and find the best model. I see there is a shell script for the command-line on the terminal (janclemenslab: TRAIN) but is there a way to do this with the train function in the das.train module in python?

Additionally, how can I get python to print the training outputs? If I run my training in the GUI, I get a printout like below that includes the f1-scores.

INFO:das.train:{'noise': {'precision': 0.9175990751370318, 'recall': 0.979656218576971, 'f1-score': 0.9476127362477954, 'support': 231717}, 'pulse': {'precision': 0.05627009646302251, 'recall': 0.015928398058252427, 'f1-score': 0.024828564672499408, 'support': 6592}, 'sine': {'precision': 0.8592689767483941, 'recall': 0.5747178602765423, 'f1-score': 0.6887611726526097, 'support': 33051}, 'accuracy': 0.9069243808962264, 'macro avg': {'precision': 0.6110460494494828, 'recall': 0.5234341589705885, 'f1-score': 0.5537341578576348, 'support': 271360}, 'weighted avg': {'precision': 0.8895708148582068, 'recall': 0.9069243808962264, 'f1-score': 0.8936685429716721, 'support': 271360}}

But I don't get this python. I am using the code below with the same files (including test set) and parameters as the GUI. Am I missing an argument to print the results?

model, params = das.train.train(model_name='tcn', data_dir='Quickstart3.npy', save_dir='Quickstart3.res', nb_hist= 256, kernel_size=16, nb_filters=16, batch_size=16, ignore_boundaries=True, verbose=1, nb_epoch=4, log_messages= True)

Any advice is much appreciated!

chitayata commented 2 years ago

Just a quick update.

I wasn't able to get the results to print automatically, however, I was able to use the following code to load and show the '_results.h5' file created during training.

import flammkuchen
import logging
logging.basicConfig(level=logging.INFO)
res = flammkuchen.load(f'pathway_results.h5')    #Load the results.h5 file
res   #print results 

Still no luck with parameter optimization within python though...

chitayata commented 2 years ago

Hi Jan and team,

Any thoughts on how can best optimize my parameters within Python?

I am working to automate the detection of chimpanzee vocalizations (pant-hoots) and I'd like to investigate different network configurations in combination with the different spectrogram denoising techniques (e.g., frequency removal, spectral subtraction). The parameters I'd like to test include: 1) TCN blocks - 2, 3, 4 2) Number of Filters - 32, 64, 96 3) Learning rate: 0.0001, 0.00001

However, this ends up being quite a large search space. How can I best test out different configurations and compare their performance?

Thanks!

postpop commented 2 years ago

Hi, glad you figured out a way to access the test results.

Regarding the optimization: The search space is not that big - only 3x3x2=18 combinations. You could brute-force this and try all combinations using das.train.train and a for loop.

We also have experimental support for automatic parameter tuning via keras tuner in das.train_tune. The interface is similar to das.train.

Python: das.train_tune.train(data_dir='tutorial_data.npy', save_dir='res', kernel_size=3, tune_config='tune.yml'). There is also a CLI that you can access via das tune.

Crucially, it accepts a yaml file with the parameter names and values you want to optimize. In your case, the tune.yml file would look like this:

nb_conv: [2, 3, 4]
nb_filters: [32, 64, 96]
learning_rate: [0.0001, 0.00001]

The tuner will then run a bunch of fits and find an optimal parameter combination in this search space - see keras tuner for how this works.

But again, in your case I think it would be easier to just run all 18 fits a couple of times to optimize these parameters.

Good luck and let me know how this goes. In particular, if you use the tuner and run into issues, because we have so far only used this internally. Also happy to have a look at your data with you to give some advice on the model parameters.

chitayata commented 2 years ago

Hi Jan,

Great, I will first try a for loop, but may also try the automatic tuner to see how it works and if I end up with different results. I will let you know how it goes. I am occasionally running into memory iussues however, so hopefully it won't be a problem to run.

It would be great to your input on model parameters though! I will send an email with my data to the one listed your site.

Many thanks for your suggestions.

chitayata commented 2 years ago

Hi Jan,

I'm trying out the automatic tuner via keras and running into some issues. I started with something simple just to see if it works but I'm not seeing any results and I am getting an error I don't understand. Suggestions?

My yml file: nb_filters: [32, 64, 96]

Code I am running : das.train_tune.train(data_dir='normalized_FR.npy', save_dir='normalized_FR.res', kernel_size=16, nb_epoch=4, tune_config='tune.yml')

Output and error message: `Trial 2 Complete [00h 01m 44s] val_loss: nan

Best val_loss So Far: nan Total elapsed time: 00h 04m 27s INFO:tensorflow:Oracle triggered exit

INFO:tensorflow:Oracle triggered exit

Results summary Results in normalized_FR.res\20220921_104822 Showing 10 best trials Objective(name='val_loss', direction='min') Trial summary Hyperparameters: nb_filters: 32 Score: nan Trial summary Hyperparameters: nb_filters: 16 Score: nan


OSError Traceback (most recent call last) Cell In [4], line 3 1 # experimental support for automatic parameter tuning via keras tuner in das.train_tune. The interface is similar to das.train 2 # importantly it accepts a yaml file with the parameter names and values you want to optimize. ----> 3 das.train_tune.train(data_dir='normalized_FR.npy', save_dir='normalized_FR.res', kernel_size=16, nb_epoch=4, tune_config='tune.yml')

File ~\miniconda3\envs\das\lib\site-packages\das\train_tune.py:592, in train(data_dir, x_suffix, y_suffix, save_dir, save_prefix, save_name, model_name, nb_filters, kernel_size, nb_conv, use_separable, nb_hist, ignore_boundaries, batch_norm, nb_pre_conv, pre_nb_dft, pre_kernel_size, pre_nb_filters, pre_nb_conv, upsample, dilations, nb_lstm_units, verbose, batch_size, nb_epoch, learning_rate, reduce_lr, reduce_lr_patience, fraction_data, seed, batch_level_subsampling, augmentations, tensorboard, wandb_api_token, wandb_project, wandb_entity, log_messages, nb_stacks, with_y_hist, balance, version_data, tune_config, nb_tune_trials, _qt_progress) 590 else: 591 logging.info('re-loading last best model') --> 592 model, params = utils.load_model_and_params(params['save_name']) 594 logging.info('predicting') 595 # TODO: Need to update params with best hyperparams (e.g. nb-hist)

File ~\miniconda3\envs\das\lib\site-packages\das\utils.py:126, in load_model_and_params(model_save_name, model_dict, custom_objects) 115 """[summary] 116 117 Args: (...) 123 keras.Model, Dict[str, Any]: [description] 124 """ 125 params = load_params(model_save_name) --> 126 model = load_model(model_save_name, model_dict=model_dict, custom_objects=custom_objects) 127 return model, params

File ~\miniconda3\envs\das\lib\site-packages\das\utils.py:43, in load_model(file_trunk, model_dict, model_ext, params_ext, compile, custom_objects) 41 try: 42 model_filename = _download_if_url(file_trunk + model_ext) ---> 43 model = keras.models.load_model(model_filename, 44 custom_objects=custom_objects) 45 except (SystemError, ValueError, AttributeError): 46 logging.debug('Failed to load model using keras, likely because it contains custom layers. Will try to init model architecture from code and load weights from _model.h5 into it.', exc_info=False)

File ~\miniconda3\envs\das\lib\site-packages\tensorflow\python\keras\saving\save.py:206, in load_model(filepath, custom_objects, compile, options) 204 filepath = path_to_string(filepath) 205 if isinstance(filepath, str): --> 206 return saved_model_load.load(filepath, compile, options) 208 raise IOError( 209 'Unable to load model. Filepath is not an hdf5 file (or h5py is not ' 210 'available) or SavedModel.')

File ~\miniconda3\envs\das\lib\site-packages\tensorflow\python\keras\saving\saved_model\load.py:122, in load(path, compile, options) 117 # TODO(kathywu): Add saving/loading of optimizer, compiled losses and metrics. 118 # TODO(kathywu): Add code to load from objects that contain all endpoints 119 120 # Look for metadata file or parse the SavedModel 121 metadata = saved_metadata_pb2.SavedMetadata() --> 122 meta_graph_def = loader_impl.parse_saved_model(path).meta_graphs[0] 123 object_graph_def = meta_graph_def.object_graph_def 124 path_to_metadata_pb = os.path.join(path, constants.SAVED_METADATA_PATH)

File ~\miniconda3\envs\das\lib\site-packages\tensorflow\python\saved_model\loader_impl.py:118, in parse_saved_model(export_dir) 116 raise IOError("Cannot parse file %s: %s." % (path_to_pbtxt, str(e))) 117 else: --> 118 raise IOError( 119 "SavedModel file does not exist at: %s%s{%s|%s}" % 120 (export_dir, os.path.sep, constants.SAVED_MODEL_FILENAME_PBTXT, 121 constants.SAVED_MODEL_FILENAME_PB))

OSError: SavedModel file does not exist at: normalized_FR.res/20220921_104822_model.h5{saved_model.pbtxt|saved_model.pb}

`

postpop commented 2 years ago

Thanks for giving this a try.

Looks like it just fails to train - loss is nan. Does the same model work when using the regular training?

Something like this?

das.train_tune.train(data_dir='normalized_FR.npy', save_dir='normalized_FR.res', kernel_size=16, nb_epoch=4, nb_filters=32)
postpop commented 2 years ago

I'll test this on my side as well just to make sure.

chitayata commented 2 years ago

When running the code provided I had the same issue as before - loss is nan. das.train_tune.train(data_dir='normalized_FR.npy', save_dir='normalized_FR.res', kernel_size=16, nb_epoch=4, nb_filters=32)

While it didn't work, I could see it was attempting to run multiple trails though. So just out of an interest to understand the code, why did it do this? I thought because we specified an exact value for each argument that it would just run 1 trail. Or is it conducting a random search?

When you said 'regular training' though did you mean das.train.train()? das.train.train(data_dir='normalized_FR.npy', save_dir='normalized_FR.res', kernel_size=16, nb_epoch=4, nb_filters=32)

Running this code worked as it should.

postpop commented 2 years ago

Oh, yes, I meant das.train.train, sorry about the confusion.

The tuning works by running different trials: it fits models with different parameters. Based on the results of the current trial, the most promising set of parameters for the next trial is selected. So it's doing something a bit smarter than a random search.

And I also get nan loss now - probably some change in keras tuner that breaks things in DAS. I'll fix this and let you know once it works.

chitayata commented 2 years ago

That's a really nice feature. So the argument inputs we specify just give it a starting point and then it optimizes from there.

Ah ok, thanks for trying it. Sounds good.

postpop commented 2 years ago

I've updated das.tune.train to now work with the new keras tuner API. You can give this a try by updating to the latest version 0.28.0 via pip (it's not on conda yet): pip install das --upgrade --no-deps

chitayata commented 2 years ago

Fabulous thank you so much! keep you posted

chitayata commented 2 years ago

Got it updated and was able to run it it with a .yml file, providing the specific values I wanted to run test and compare. Seems to be running well and will be helpful for optimization. I find this easier than creating a for loop and I think I will stick with it. Many thanks!

postpop commented 2 years ago

Great - closing this.