jackievaleri / BioAutoMATED

Automated machine learning for analyzing, interpreting, and designing biological sequences
MIT License
164 stars 20 forks source link

Error when running example 01 #6

Open lucaskbobadilla opened 10 months ago

lucaskbobadilla commented 10 months ago

Hello,

I am trying to use a Docker image of Bioautomated to train a model. The Keras models is giving me an error as described below:

I think is some error with TensorFlow. Any ideas how to solve it?

Also, I updated the dockerfile to FROM nvidia/cuda:11.7.1-cudnn8-devel-ubuntu22.04 instead of FROM ubuntu:18.04 to be able to use CUDA GPU.

Thanks!

ValueError                                Traceback (most recent call last)
<ipython-input-23-843a191e193a> in <module>
     15 output_folder = './exemplars/test/outputs/'
     16 
---> 17 run_bioautomated(task, data_folder, data_file, sequence_type, model_folder, output_folder, input_col=input_col, target_col=target_col, max_runtime_minutes=max_runtime_minutes, num_folds=num_folds, verbosity=verbosity, num_final_epochs=num_final_epochs, yaml_params=yaml_params, num_generations=num_generations, population_size=population_size)

/home/jovyan/main_classes/wrapper.py in run_bioautomated(task, data_folder, data_file, sequence_type, model_folder, output_folder, automl_search_techniques, do_backup, max_runtime_minutes, num_folds, verbosity, do_auto_bin, bin_threshold, do_transform, input_col, target_col, pad_seqs, augment_data, dataset_robustness, num_final_epochs, yaml_params, num_generations, population_size, run_interpretation, interpret_params, run_design, design_params)
    581         print("#################################################################################################")
    582         print('')
--> 583         run_binaryclass(data_folder, data_file, sequence_type, model_folder, output_folder, automl_search_techniques, max_runtime_minutes, num_folds, verbosity, do_backup, do_auto_bin, bin_threshold, input_col, target_col, pad_seqs, augment_data, dataset_robustness, num_final_epochs, yaml_params, num_generations, population_size, run_interpretation, interpret_params, run_design, design_params)
    584 
    585     elif task == 'multiclass_classification':

/home/jovyan/main_classes/wrapper.py in run_binaryclass(data_folder, data_file, sequence_type, model_folder, output_folder, automl_search_techniques, max_runtime_minutes, num_folds, verbosity, do_backup, do_auto_bin, bin_threshold, input_col, target_col, pad_seqs, augment_data, dataset_robustness, num_final_epochs, yaml_params, num_generations, population_size, run_interpretation, interpret_params, run_design, design_params)
    102 
    103         dsc = DeepSwarmClassification(data_path, model_folder + run_folder, output_folder + run_folder, max_runtime=max_runtime_minutes, num_folds=num_folds, sequence_type=sequence_type, do_auto_bin=do_auto_bin, bin_threshold=bin_threshold, verbosity=verbosity, yaml_params=yaml_params, num_final_epochs=num_final_epochs, input_col=input_col, target_col=target_col, pad_seqs=pad_seqs, augment_data=augment_data, multiclass=False, dataset_robustness=dataset_robustness, run_interpretation = run_interpretation, interpret_params = interpret_params, run_design = run_design, design_params = design_params)
--> 104         dsc.run_system()
    105 
    106         # create backup folder

/home/jovyan/main_classes/generic_deepswarm.py in run_system(self)
    445 
    446             # run deepswarm to find best model
--> 447             backend, deepswarm, topology, base_model  = self.find_best_architecture(X, y,)
    448 
    449             seed = 7

/home/jovyan/main_classes/generic_deepswarm.py in find_best_architecture(self, X, y)
    308         # Save the DeepSwarm optimal topology with reinitialized weights
    309         topology = base_model
--> 310         reset_weights(topology)
    311 
    312         topology.save(self.model_folder + 'deepswarm_topology.h5')

/home/jovyan/main_classes/generic_deepswarm.py in reset_weights(model)
    100     for layer in model.layers:
    101         if hasattr(layer, 'kernel_initializer'):
--> 102             layer.kernel.initializer.run(session=session)
    103 
    104 def fit_final_model(topology_path, num_epochs, compile_model, X, y):

/opt/miniconda/envs/automl_py37/lib/python3.7/site-packages/tensorflow/python/framework/ops.py in run(self, feed_dict, session)
   2448         none, the default session will be used.
   2449     """
-> 2450     _run_using_default_session(self, feed_dict, self.graph, session)
   2451 
   2452 _gradient_registry = registry.Registry("gradient")

/opt/miniconda/envs/automl_py37/lib/python3.7/site-packages/tensorflow/python/framework/ops.py in _run_using_default_session(operation, feed_dict, graph, session)
   5211   else:
   5212     if session.graph is not graph:
-> 5213       raise ValueError("Cannot use the given session to execute operation: "
   5214                        "the operation's graph is different from the session's "
   5215                        "graph.")

ValueError: Cannot use the given session to execute operation: the operation's graph is different from the session's graph.
lucaskbobadilla commented 10 months ago

Ok, I restarted the kernel and it took care of the error. But now I am stuck in The Keras model (Deepswarm was very fast):

#################################################################################################
#######################               RUNNING BINARY CLASSIFICATION            ##################
#################################################################################################

#################################################################################################
##############################            RUNNING DEEPSWARM           ###########################
#################################################################################################
Conducting architecture search now...
Testing scrambled control now...
Fitting final model now...
#################################################################################################
##############################            RUNNING AUTOKERAS           ###########################
#################################################################################################
Conducting architecture search now...

Any ideas?

jackievaleri commented 10 months ago

Hi Lucas, AutoKeras can be pretty slow, compared to DeepSwarm, depending on the resources available to you. You can see the time differences we found in our Figure S2.

Can I ask what your max_runtime_minutes is set to? I sometimes raise it to 180 and let AutoKeras run overnight. Furthermore, is your dataset really large? For huge datasets (e.g., >100K sequences), it might make sense to find an optimal architecture with a subset of the dataset and then train on the identified model architecture with all sequences. Lastly, you may need to play with the -shm-size flag as in our installation guide (bullet point under step 5 in option 1).

lucaskbobadilla commented 10 months ago

That's only with the exemple 01 Jupyter notebook. That is why I think it is weird it is running so slow. My dataset will have around 70,000 sequences. I am running it in as a pod in Kubernetes with more enough resources.

Also, any specific reason to use Tensorflow 1.13? I am trying to rebuild an image with CUDA but the tensorflow 1.13 o ou works with CUDA 10 which does not have an NVIDIA image available anymore. Any ideas how to make bioatomated to be able to use GPU support?

jackievaleri commented 10 months ago

Hi Lucas, AutoKeras should probably finish up in a few hours with that size dataset. Could you let me know if you are facing the same problem with TPOT, or is it just AutoKeras?

We used Tensorflow 1 because we started the project back in 2019, and it made the most sense at the time for the goal of balancing package dependencies. If you want to try to create a Dockerfile that works for Tensorflow 2 we'd love that and would definitely encourage a PR!