Open odinsbane opened 8 months ago
Actually, this simple example did finally finish with tensorflow, It took 9 minutes. If I hide the gpu by export CUDA_VISIBLE_DEVICES=""
then it finishes in 5 seconds.
If I try to train just the mobile net, then tensorflow seems to work. It is only when I try the transfer learning and use another output.
I was able to run the code successfully using TensorFlow within few seconds, here is the Gist attached for reference https://gist.github.com/sachinprasadhs/d8667509dd1ad6d22d88336eab821a0f
In the notebook you provided it says, the model has no trainable weights. When I ran it, the last layer was trainable. Does that make a difference?
On Wed, Jan 17, 2024, 7:55 PM Sachin Prasad @.***> wrote:
I was able to run the code successfully using TensorFlow within few seconds, here is the Gist attached for reference https://gist.github.com/sachinprasadhs/d8667509dd1ad6d22d88336eab821a0f
— Reply to this email directly, view it on GitHub https://github.com/keras-team/keras/issues/19061#issuecomment-1896446547, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2NNEOGLGTGLYDUKX67WDLYPANBRAVCNFSM6AAAAABB4QHGLGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJWGQ2DMNJUG4 . You are receiving this because you authored the thread.Message ID: @.***>
In your code, you have set trainable=False
and there is no layer with the name "train_me", so it would have 0 trainable parameters.
if layer.name == "train_me":
print("training")
else:
layer.trainable = False
Total params: 4,277,441 (16.32 MB)
Trainable params: 0 (0.00 B)
Non-trainable params: 4,277,441 (16.32 MB)
The layer I add, I give it the name train_me so it has trainable weights. I am not sure how that got removed from the example.
On Thu, Jan 18, 2024, 12:41 AM Sachin Prasad @.***> wrote:
In your code, you have set trainable=False and there is no layer with the name "train_me", so it would have 0 trainable parameters.
if layer.name == "train_me": print("training") else: layer.trainable = False
Total params: 4,277,441 (16.32 MB) Trainable params: 0 (0.00 B) Non-trainable params: 4,277,441 (16.32 MB)
— Reply to this email directly, view it on GitHub https://github.com/keras-team/keras/issues/19061#issuecomment-1897487146, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2NNEPS2LRDROZ7QDIPEL3YPBOS5AVCNFSM6AAAAABB4QHGLGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJXGQ4DOMJUGY . You are receiving this because you authored the thread.Message ID: @.***>
I checked again, even when there are no trainable weights this takes a long time on my computer with a tensorflow backend. It is the same on both wsl2 and linux for me.
Is the notebook your creating using the gpu?
A little more debugging info. If I use keras 2.15 that was installed with tensorflow then it works as expected. The main difference I see is errors from ptx.
Unsupported .version 7.8; current version is '7.5'
ptxas fatal : Ptx assembly aborted due to errors
It seems like my cuda/tensorflow drivers might not be correct for keras 3. I installed them using the recommended instructions from keras website. Eg. pip install tensorflow[and-cuda]
then pip install --upgrade keras
.
When I train a model built with a keras.applications app using a tensorflow backend, it never finishes a batch. When I use pytorch as a back end it trains fine.
Here is a working example:
If I run this with a tensorflow backend, then it never finishes. If I run it with a pytorch backend then it finishes very quickly, less than 1 second. I haven't seen the tensorflow version finish yet.
This is a warning I get from tensorflow: