Open anisayari opened 3 years ago
More information is needed. Perhaps you can provide a small reproducable example a dummy dataset? Which function call results in this infinite loop? Only component that relies on Java GC is an HDFS driver (if you are using Java based HDFS driver). Otherwise, there not sure which GC is emitting these log messages.
It can be because of the tf.keras.layers.experimental.preprocessing.Discretization
layer. Replace it with the sklearn.preprocessing.KBinsDiscretizer
, outside of the model - and the training will run much quicker.
After running the first epoch my training is stuck in an infinite GC... I kept it running for 18hours and is still running, while all the training should be done in <4hours.
I don't understand and I cannot find any ressource online. It happen since I am using Petastorm distributed dataset for tensorflow.
I really do not know what I could do. Any suggestions please ?
Thank you