linkedin / dagli

Framework for defining machine learning models, including feature generation and transformations, as directed acyclic graphs (DAGs).
BSD 2-Clause "Simplified" License
354 stars 40 forks source link

Logging frequence has no effect on DL4J logging intervals #7

Open cyberbeat opened 3 years ago

cyberbeat commented 3 years ago

No matter what I set as logging frequence in "NeuralNetwork" (for example TrainingAmount.minibatches(1) or TrainingAmount.minibatches(300)), every iteration is logged. I have to admit, that I got new hardware, and compiled against latest dagli combined with dl4j-SNAPSHOT to get it running with cuda-11.2, so I am not sure, if this is reproducible.

jeffpasternack commented 3 years ago

That's very curious. I'm assuming the number of examples per iteration size is at least twice the number of examples per minibatch? Also, could you please share the line of code you're using to configure the logging frequency and the unexpected log lines you're seeing? It might be that something related is being logged rather than what's specifically being configured.

cyberbeat commented 3 years ago

These are the logged lines (2 iterations) [pool-1-thread-27] INFO org.deeplearning4j.optimize.listeners.TimeIterationListener - Remaining time : 2mn - End expected : Tue Feb 09 21:55:39 CET 2021 [pool-1-thread-27] INFO org.deeplearning4j.optimize.listeners.PerformanceListener - ETL: 0 ms; iteration 360204; iteration time: 62 ms; samples/sec: 2064,516; batches/sec: 16,129; GC: [G1 Young Generation: 0 (0ms)], [G1 Old Generation: 0 (0ms)]; [pool-1-thread-27] INFO org.deeplearning4j.optimize.listeners.TimeIterationListener - Remaining time : 2mn - End expected : Tue Feb 09 21:55:39 CET 2021 [pool-1-thread-27] INFO org.deeplearning4j.optimize.listeners.PerformanceListener - ETL: 0 ms; iteration 360205; iteration time: 61 ms; samples/sec: 2098,361; batches/sec: 16,393; GC: [G1 Young Generation: 0 (0ms)], [G1 Old Generation: 0 (0ms)]; [pool-1-thread-27] INFO org.deeplearning4j.optimize.listeners.TimeIterationListener - Remaining time : 2mn - End expected : Tue Feb 09 21:55:39 CET 2021

Currently configured with: .withTrainingProgressLoggingFrequency(TrainingAmount.minibatches(300)) .withTrainingPerformanceLoggingFrequency(TrainingAmount.minibatches(300)) .withMinibatchSize(3000)

In my understanding, there are 16 batches per second, so the log rows should appear around every 18 seconds, but multiple per second appear.

jeffpasternack commented 3 years ago

I checked the code and the conversion from "minibatches" to "iterations" is just an identity function, and the check against frequency is straightforward, so there's no obvious bug; I also ran a test and checked that the logging behaves as expected. Is it possible that you might be configuring the neural network but not storing the result? E.g. if you were to write code like this: NeuralNetwork myModel = ...; myModel.withTrainingProgressLoggingFrequency(TrainingAmount.minibatches(300)); The new variant of myModel with the modified logging frequency is just discarded (because it's not stored in myModel), and subsequent use of myModel would use the original version, without the every-300-minibatches logging.

Of course, if this were the case, I'd expect the result would be the default behavior of no logging, not logging every iteration. If this isn't the case, is there any chance you could add a breakpoint inside org.deeplearning4j.optimize.listeners.PerformanceListener's constructor and report the value of the frequency parameter? It should be 300.