Model not training - Githubissues

DmitriyALoza commented 1 year ago

Whenever I try to train the model I get an error that says this:

WARNING:tensorflow:Model was constructed with shape (None, 512, 512, 3) for input KerasTensor(type_spec=TensorSpec(shape=(None, 512, 512, 3), dtype=tf.float32, name='input_1'), name='input_1', description="created by layer 'input_1'"), but it was called on an input with incompatible shape (None, None, None). Traceback (most recent call last): File "C:\Users\Documents\OrganoID-master\OrganoID.py", line 33, in program.RunProgram(args) File "C:\Users\Documents\OrganoID-master\CommandLine\Train.py", line 99, in RunProgram TrainModel(model, parserArgs.learningRate, parserArgs.patience, parserArgs.epochs, File "C:\Users\Documents\OrganoID-master\Core\Model.py", line 82, in TrainModel model.fit(x=ImageGenerator(trainingData, batchSize, model), File "C:\Users\Anaconda3\envs\OrganoID\lib\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler raise e.with_traceback(filtered_tb) from None File "C:\Users\AppData\Local\Temp__autograph_generated_filetbjvh430.py", line 15, in tftrainfunction retval = ag.converted_call(ag__.ld(step_function), (ag.ld(self), ag.ld(iterator)), None, fscope) ValueError: in user code:

File "C:\Users\Anaconda3\envs\OrganoID\lib\site-packages\keras\engine\training.py", line 1249, in train_function  *
    return step_function(self, iterator)
File "C:\Users\Anaconda3\envs\OrganoID\lib\site-packages\keras\engine\training.py", line 1233, in step_function  **
    outputs = model.distribute_strategy.run(run_step, args=(data,))
File "C:\Users\Anaconda3\envs\OrganoID\lib\site-packages\keras\engine\training.py", line 1222, in run_step  **
    outputs = model.train_step(data)
File "C:\Users\Anaconda3\envs\OrganoID\lib\site-packages\keras\engine\training.py", line 1023, in train_step
    y_pred = self(x, training=True)
File "C:\Users\Anaconda3\envs\OrganoID\lib\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
File "C:\Users\Anaconda3\envs\OrganoID\lib\site-packages\keras\engine\input_spec.py", line 250, in assert_input_compatibility
    raise ValueError(

ValueError: Exception encountered when calling layer 'model' (type Functional).

Input 0 of layer "conv2d" is incompatible with the layer: expected min_ndim=4, found ndim=3. Full shape received: (None, None, None)

Call arguments received by layer 'model' (type Functional):
  • inputs=tf.Tensor(shape=(None, None, None), dtype=uint8)
  • training=True
  • mask=None

I have used the preexisting Augment.py script and also my own to augment the images. Both Augmenters use an image resized function that resizes the images to 1984, 1984, 3 and also 512, 512, 3. I do not understand how to fix this error and any help would be greatly appreciated!

schmoogol commented 1 year ago

Try changing line 17 of Model.py from: inputs = tf.keras.layers.Input((imageSize[0], imageSize[1], 3)) to inputs = tf.keras.layers.Input((imageSize[0], imageSize[1], 1))

That seems to allow the training to run for me, although I have yet to determine whether it generates a valid model.

DmitriyALoza commented 1 year ago

So I tried your approach and created a model but it created a model that did not weigh as much as the "OptimizedModel". It creates a folder with the saved_model.pb which is 711KB. It also does not segment the images anymore. I would love to see if there is anything that I am missing when trying to train the model. I also get an error when tracking:

"File ____\Tracking.py", line 64, in Track m2 = np.zeros(np.max(mapping[:,0])+1)" "IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed".

I'm not sure if it's: 1) An issue with the way that I trained the model (following the advice you gave me earlier by changing line 17 of Model.py) 2) not using the right trained model (the sizes of the models are different and I'm using the saved_model.pb which is also located in the "TrainableModel" folder. 3) Any other issue

Any help would be great!

schmoogol commented 1 year ago

It sounds like you aren't using a proper model file. Make sure you include '--lite' in the train command to generate the correct .tflite file for running the identification. Once generated, use that file instead of 'OptimizedModel' in the run command. Make sure you include the .tflite extension in the command (e.g. newmodel.tflite), or just remove that extension from the file if you want.

Djul0 commented 1 year ago

Changing line 17 seems to solve the training error. But i still have the same issue as @WinterMedved. I've tried the solution of @schmoogol with the addition of "--lite". the training is very fast (around 20 epoch) and at the end i have these messages showing :

2023-11-08 11:20:49.590241: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:378] Ignored output_format.
2023-11-08 11:20:49.590277: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:381] Ignored drop_control_dependency.
2023-11-08 11:20:49.590677: I tensorflow/cc/saved_model/reader.cc:83] Reading SavedModel from: /var/folders/1s/nwdhlk99679c3wlc3tb25d880000gn/T/tmp12457uj7
2023-11-08 11:20:49.594018: I tensorflow/cc/saved_model/reader.cc:51] Reading meta graph with tags { serve }
2023-11-08 11:20:49.594025: I tensorflow/cc/saved_model/reader.cc:146] Reading SavedModel debug info (if present) from: /var/folders/1s/nwdhlk99679c3wlc3tb25d880000gn/T/tmp12457uj7
2023-11-08 11:20:49.600012: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:382] MLIR V1 optimization pass is not enabled
2023-11-08 11:20:49.602312: I tensorflow/cc/saved_model/loader.cc:233] Restoring SavedModel bundle.
2023-11-08 11:20:49.706858: I tensorflow/cc/saved_model/loader.cc:217] Running initialization op on SavedModel bundle at path: /var/folders/1s/nwdhlk99679c3wlc3tb25d880000gn/T/tmp12457uj7
2023-11-08 11:20:49.735001: I tensorflow/cc/saved_model/loader.cc:316] SavedModel load for tags { serve }; Status: success: OK. Took 144325 microseconds.
2023-11-08 11:20:49.773980: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.

When i use the newmodel.tflite It doesnt seems to change anything and it does not segment the images anymore. I'm pretty sure it's a taining problem. here is the command i used to start the training:

python OrganoID.py train /path/to/trainimg/images /path/to/new/model MODELNAME -M TrainableModel --lite

Am i missing something?

schmoogol commented 1 year ago

Have you tried carrying out the training with the dataset linked in the readme to rule out any problems with your dataset? Presumably you have included the correct paths to your training images and the trainable model and have not used the command above verbatim?

Djul0 commented 1 year ago

Dont worried I did not used the command above verbatim :)

Your suggestion was correct, i tried to use the linked dataset and it's working. so i knew it was a problem in my images format. And indeed my images were in ".tif" i converted them in ".png" format and it's working ! Thank you so much for your help @schmoogol !!!

jono-m / OrganoID

Model not training #6