Open cyberbeat opened 3 years ago
That's very curious. I'm assuming the number of examples per iteration size is at least twice the number of examples per minibatch? Also, could you please share the line of code you're using to configure the logging frequency and the unexpected log lines you're seeing? It might be that something related is being logged rather than what's specifically being configured.
These are the logged lines (2 iterations)
[pool-1-thread-27] INFO org.deeplearning4j.optimize.listeners.TimeIterationListener - Remaining time : 2mn - End expected : Tue Feb 09 21:55:39 CET 2021 [pool-1-thread-27] INFO org.deeplearning4j.optimize.listeners.PerformanceListener - ETL: 0 ms; iteration 360204; iteration time: 62 ms; samples/sec: 2064,516; batches/sec: 16,129; GC: [G1 Young Generation: 0 (0ms)], [G1 Old Generation: 0 (0ms)]; [pool-1-thread-27] INFO org.deeplearning4j.optimize.listeners.TimeIterationListener - Remaining time : 2mn - End expected : Tue Feb 09 21:55:39 CET 2021 [pool-1-thread-27] INFO org.deeplearning4j.optimize.listeners.PerformanceListener - ETL: 0 ms; iteration 360205; iteration time: 61 ms; samples/sec: 2098,361; batches/sec: 16,393; GC: [G1 Young Generation: 0 (0ms)], [G1 Old Generation: 0 (0ms)]; [pool-1-thread-27] INFO org.deeplearning4j.optimize.listeners.TimeIterationListener - Remaining time : 2mn - End expected : Tue Feb 09 21:55:39 CET 2021
Currently configured with: .withTrainingProgressLoggingFrequency(TrainingAmount.minibatches(300)) .withTrainingPerformanceLoggingFrequency(TrainingAmount.minibatches(300)) .withMinibatchSize(3000)
In my understanding, there are 16 batches per second, so the log rows should appear around every 18 seconds, but multiple per second appear.
I checked the code and the conversion from "minibatches" to "iterations" is just an identity function, and the check against frequency is straightforward, so there's no obvious bug; I also ran a test and checked that the logging behaves as expected. Is it possible that you might be configuring the neural network but not storing the result? E.g. if you were to write code like this:
NeuralNetwork myModel = ...;
myModel.withTrainingProgressLoggingFrequency(TrainingAmount.minibatches(300));
The new variant of myModel
with the modified logging frequency is just discarded (because it's not stored in myModel
), and subsequent use of myModel
would use the original version, without the every-300-minibatches logging.
Of course, if this were the case, I'd expect the result would be the default behavior of no logging, not logging every iteration. If this isn't the case, is there any chance you could add a breakpoint inside org.deeplearning4j.optimize.listeners.PerformanceListener
's constructor and report the value of the frequency
parameter? It should be 300.
No matter what I set as logging frequence in "NeuralNetwork" (for example TrainingAmount.minibatches(1) or TrainingAmount.minibatches(300)), every iteration is logged. I have to admit, that I got new hardware, and compiled against latest dagli combined with dl4j-SNAPSHOT to get it running with cuda-11.2, so I am not sure, if this is reproducible.