maikherbig / AIDeveloper

GUI-based software for training, evaluating and applying deep neural nets for image classification
BSD 2-Clause "Simplified" License
107 stars 20 forks source link

Support for EfficientNetV2L and ConvNeXtXLarge #56

Open jonathancolledge opened 6 months ago

jonathancolledge commented 6 months ago

Hi, I tried this in modelzoo

def EfficientNetV2L(in_dim1,in_dim2,channels,out_dim):

    inputs = Input(shape=(in_dim1, in_dim2, channels))

    model_ = keras.applications.efficientnet_v2.EfficientNetV2L(include_top=True, weights=None,input_tensor=inputs, classes=out_dim)

    # layers = model_.layers

    # layers[0]._name = "inputTensor"

    # layers[-1]._name = "outputTensor"

    predictions =  model_(inputs)

    model = Model(inputs, predictions)

    return model

def ConvNeXtXLarge(in_dim1,in_dim2,channels,out_dim):

    inputs = Input(shape=(in_dim1, in_dim2, channels))

    model_ = keras.applications.convnext.ConvNeXtXLarge(include_top=True, weights=None,input_tensor=inputs, classes=out_dim)

    # layers = model_.layers

    # layers[0]._name = "inputTensor"

    # layers[-1]._name = "outputTensor"

    predictions =  model_(inputs)

    model = Model(inputs, predictions)

   

    return model

But it says no module found in keras.applications

Can you help please?

maikherbig commented 6 months ago

The reason is that the installed version of keras.applications does not contain (yet) those models. You can actually run Python code within AIDeveloper to 'look under the hood': image Let me try to update this module...(could take a bit). In the meanwhile you could try some of those models that are implemented (In my experience, the efficientnet B0 version is already overkill for many applications).

jonathancolledge commented 6 months ago

Thank you, yes I had hoped to be running EfficientNet B0 right now but I get an error. The reason I was hoping to eek out that little bit more was because I'm running this on medical images and using the current dataset most previous attempts (not by me) with other CNNs such as Inception haven't been quite good enough. I was going to train on the Mura dataset, then train the top layers on Bone Ages. Here are the errors I get with EfficientNet B0, I don't know where it gets the dimensions of 726 from, they are all 512 x 512 already.

WARNING:tensorflow:AutoGraph is not available in this environment: functions lack code information. This is typical of some environments like the interactive Python shell. See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/autograph/g3doc/reference/limitations.md#access-to-source-code for more information. Nr. of CPUs detected: 12 Nr. of GPUs detected: 2 List of device(s): ------------------------ Device 0: /device:CPU:0 Device type: CPU Device description: ------------------------ Device 1: /device:GPU:0 Device type: GPU Device description: device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6 ------------------------ Device 2: /device:GPU:1 Device type: GPU Device description: device: 1, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:02:00.0, compute capability: 8.6 ------------------------ AIDeveloper Version: 0.4.8 model_zoo.py Version: 0.1.4_dev1_JDC Adjusted GPU options for Multi-GPU usage. Set memeory fraction to 0.7 WARNING:tensorflow:From tensorflow\python\keras\layers\normalization\batch_normalization.py:532: _colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. tensorflow\python\keras\utils\generic_utils.py:494: CustomMaskWarning: Custom mask layers require a config and must override get_config. When loading, the custom mask layer must be passed to the custom_objects argument. No data on RAM. I have to load channel_list: ['image'] I'm loading all images (from disk) Final size:(65799, 726, 726),(65799,) channel_list: ['image'] I'm loading all images (from disk) Final size:(44619, 726, 726),(44619,) channel_list: ['image'] I'm loading all images (from disk) Final size:(1667, 726, 726),(1667,) channel_list: ['image'] I'm loading all images (from disk) Final size:(1530, 726, 726),(1530,) QLayout::addChildLayout: layout "horizontalLayout_5_pop" already has a parent Adjusted GPU options for Multi-GPU usage. Set memeory fraction to 0.7 WARNING:tensorflow:From tensorflow\python\keras\backend.py:6463: StrategyBase.configure (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version. Instructions for updating: use update_config_proto instead. Adjusting the model for Multi-GPU Getting dictionary for class_weight Recompiling... Recompiled parallel model to adjust learning rate, loss, optimizer Length of DATA (in RAM) = 4 Removed data from self.ram. For further training sessions, data has to be reloaded. Loaded data from RAM Loaded data from RAM Exporting is turned off Current dim. of validation set = (192, 726, 726, 1) Change dim. (pixels x pixels) of validation set to = 512 Loaded data from RAM Loaded data from RAM Time to load data (from .rtdc or RAM) and crop=0.25774940000007973 Time to perform affine augmentation =0.3217817999999397 Time to augment contrast=0.0821442000000161 Time to perform average blurring=0.05661359999999149 Time to augment brightness=0.2767681000000266 Time to apply normalization=0.1664451999999983 X_batch.shape (192, 512, 512, 1) tensorflow\python\data\ops\dataset_ops.py:453: UserWarning: To make it possible to preserve tf.data options across serialization boundaries, their implementation has moved to be part of the TensorFlow graph. As a consequence, the options value is in general no longer known at graph construction time. Invoking this method in graph mode retains the legacy behavior of the original implementation, but note that the returned value might not reflect the actual value of the options. WARNING:tensorflow:From tensorflow\python\distribute\input_lib.py:801: DistributedIteratorV1.initialize (from tensorflow.python.distribute.input_lib) is deprecated and will be removed in a future version. Instructions for updating: Use the iterator's initializer property instead. tensorflow\python\keras\backend.py:495: UserWarning: tf.keras.backend.learning_phase_scope is deprecated and will be removed after 2020-10-11. To update it, simply pass a True/False value to the training argument of the __call__ method of your layer or model. WARNING:tensorflow:From tensorflow\python\keras\distribute\distributed_training_utils_v1.py:343: StrategyBase.unwrap (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version. Instructions for updating: use experimental_local_results instead. Traceback (most recent call last): File "tensorflow\python\client\session.py", line 1380, in _do_call File "tensorflow\python\client\session.py", line 1362, in _run_fn File "tensorflow\python\client\session.py", line 1403, in _extend_graph tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'NcclAllReduce' used by {{node training/Adam/NcclAllReduce}} with these attrs: [reduction="sum", shared_name="c0", T=DT_FLOAT, num_devices=2] Registered devices: [CPU, GPU] Registered kernels: [[training/Adam/NcclAllReduce]] During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\Users\jonat\Downloads\AIDeveloper_0.4.0_GPU_Windows\aid_backbone.py", line 203, in run result = self.fn(*self.args, *self.kwargs) File "C:\Users\jonat\Downloads\AIDeveloper_0.4.0_GPU_Windows\aid_backbone.py", line 6172, in action_fit_model_worker history = model_keras_p.fit(X_batch, Y_batch, File "tensorflow\python\keras\engine\training_v1.py", line 793, in fit File "tensorflow\python\keras\engine\training_distributed_v1.py", line 669, in fit File "tensorflow\python\keras\engine\training_arrays_v1.py", line 181, in model_iteration File "tensorflow\python\keras\engine\training_arrays_v1.py", line 550, in _make_execution_function File "tensorflow\python\keras\distribute\distributed_training_utils_v1.py", line 819, in _make_execution_function File "tensorflow\python\keras\distribute\distributed_training_utils_v1.py", line 919, in _make_execution_function_with_cloning File "tensorflow\python\keras\distribute\distributed_training_utils_v1.py", line 947, in _make_graph_execution_function File "tensorflow\python\keras\distribute\distributed_training_utils_v1.py", line 400, in init_restore_or_wait_for_variables File "tensorflow\python\keras\backend.py", line 1197, in _initialize_variables File "tensorflow\python\client\session.py", line 970, in run File "tensorflow\python\client\session.py", line 1193, in _run File "tensorflow\python\client\session.py", line 1373, in _do_run File "tensorflow\python\client\session.py", line 1399, in _do_call tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'NcclAllReduce' used by node training/Adam/NcclAllReduce (defined at tensorflow\python\keras\optimizer_v2\utils.py:148) with these attrs: [reduction="sum", shared_name="c0", T=DT_FLOAT, num_devices=2] Registered devices: [CPU, GPU] Registered kernels: [[training/Adam/NcclAllReduce]] Errors may have originated from an input operation. Input Source operations connected to node training/Adam/NcclAllReduce: In[0] training/Adam/split: Operation defined at: (most recent call last) >>> File "C:\Users\jonat\Downloads\AIDeveloper_0.4.0_GPU_Windows\aid_backbone.py", line 203, in run >>> result = self.fn(self.args, **self.kwargs) >>> >>> File "C:\Users\jonat\Downloads\AIDeveloper_0.4.0_GPU_Windows\aid_backbone.py", line 6172, in action_fit_model_worker >>> history = model_keras_p.fit(X_batch, Y_batch, >>> >>> File "tensorflow\python\keras\engine\training_v1.py", line 793, in fit >>> >>> File "tensorflow\python\keras\engine\training_distributed_v1.py", line 669, in fit >>> >>> File "tensorflow\python\keras\engine\training_arrays_v1.py", line 181, in model_iteration >>> >>> File "tensorflow\python\keras\engine\training_arrays_v1.py", line 550, in _make_execution_function >>> >>> File "tensorflow\python\keras\distribute\distributed_training_utils_v1.py", line 819, in _make_execution_function >>> >>> File "tensorflow\python\keras\distribute\distributed_training_utils_v1.py", line 919, in _make_execution_function_with_cloning >>> >>> File "tensorflow\python\keras\distribute\distributed_training_utils_v1.py", line 940, in _make_graph_execution_function >>> >>> File "tensorflow\python\keras\optimizer_v2\utils.py", line 148, in _all_reduce_sum_fn >>>

maikherbig commented 6 months ago

For me, training the model "efficientnet B0" works: image Could it be that you are trying to use two GPUs in paralell? If that is the case, please try to use "Single-GPU".

Regarding the 726 pixels: training images are enlarged a bit ( size=np.sqrt(5122+5122)) ). This ensures that you have sufficient pixels to allow rotation.

jonathancolledge commented 6 months ago

Thanks, single GPU works! Of course I'm still interested in EfficientNetV2L and ConvNeXtXLarge if are ever able to update the module please. Best wishes, Jonathan