Open vfdev-5 opened 4 years ago
@vfdev-5 is this solved, what is the error .... I got this while running
ββ ο
Ό ξ° debo@pop-os ξ° ο ~ ξ° ξ² ο ξ² 34.66s ο ξ² 5G ο€ ξ² 1.30 ο ξ² 17:33:26 ο
β°β python test.py
- Start from 0 iteration
2021-02-16 17:33:38,589 trainer INFO: Engine run starting with max_epochs=2.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 --> stop at 50
2021-02-16 17:33:38,591 trainer INFO: Terminate signaled. Engine will stop after current iteration is finished.
2021-02-16 17:33:38,591 trainer INFO: Epoch[1] Complete. Time taken: 00:00:00
2021-02-16 17:33:38,591 trainer INFO: Engine run complete. Time taken: 00:00:00
- Ended on 50 iteration | 1 epoch
-- Do something else
- Continue from 50 iteration
2021-02-16 17:33:38,591 trainer INFO: Engine run resuming from iteration 50, epoch 1 until 2 epochs
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 --> stop at 100
2021-02-16 17:33:38,593 trainer INFO: Terminate signaled. Engine will stop after current iteration is finished.
2021-02-16 17:33:38,593 trainer INFO: Epoch[2] Complete. Time taken: 00:00:00
2021-02-16 17:33:38,593 trainer INFO: Engine run complete. Time taken: 00:00:00
- Ended on 100 iteration | 2 epoch
-- Do something else
- Continue from 100 iteration
2021-02-16 17:33:38,593 trainer INFO: Engine run resuming from iteration 100, epoch 2 until 2 epochs
2021-02-16 17:33:38,593 trainer INFO: Engine run complete. Time taken: 00:00:00
- Ended on 100 iteration | 2 epoch
-- Do something else
- Continue from 100 iteration
2021-02-16 17:33:38,593 trainer INFO: Engine run resuming from iteration 100, epoch 2 until 2 epochs
2021-02-16 17:33:38,593 trainer INFO: Engine run complete. Time taken: 00:00:00
- Ended on 100 iteration | 2 epoch
-- Do something else
- Continue from 100 iteration
2021-02-16 17:33:38,593 trainer INFO: Engine run resuming from iteration 100, epoch 2 until 2 epochs
2021-02-16 17:33:38,593 trainer INFO: Engine run complete. Time taken: 00:00:00
- Ended on 100 iteration | 2 epoch
ββ ο
Ό ξ° debo@pop-os ξ° ο ~ ξ° ξ² ο ξ² 8.17s ο ξ² 5G ο€ ξ² 1.32 ο ξ² 17:33:39 ο
β°β
@sparkingdark there is no explicit error raised here, but epoch value is wrong. Here is a snippet with more explicit epoch check:
from ignite.engine import Engine, Events
from ignite.utils import setup_logger
stop_iter = 2
epoch_length = 15
max_epochs = 5
trainer = Engine(lambda e, b: print(b, end=" "))
trainer.logger = setup_logger("trainer")
state = trainer.state
@trainer.on(Events.ITERATION_COMPLETED(every=stop_iter))
def stop():
print("--> stop at {}".format(trainer.state.iteration))
trainer.terminate()
data = list(range(epoch_length))
print("- Start from {} iteration".format(state.iteration))
state = trainer.run(data, max_epochs=max_epochs, epoch_length=epoch_length)
print("- Ended on {} iteration | {} epoch".format(state.iteration, state.epoch))
print("-- Do something else")
print("- Continue from {} iteration".format(state.iteration))
state = trainer.run(data, max_epochs=max_epochs, epoch_length=epoch_length)
print("- Ended on {} iteration | {} epoch".format(state.iteration, state.epoch))
print("-- Do something else")
print("- Continue from {} iteration".format(state.iteration))
state = trainer.run(data, max_epochs=max_epochs, epoch_length=epoch_length)
print("- Ended on {} iteration | {} epoch".format(state.iteration, state.epoch))
print("-- Do something else")
assert state.epoch == 1, state.epoch
Also, note that we do not continue iterating the data but restart from the first samples which is wrong as well.
Okay somehow need a fix which can resume from the current value. am i correct ?
Well, this is a bit complicated to fix as is. I think this will be done with Engine refactor that I'm initiated some time ago...
Okay so am I try to solve it or look into other issues @vfdev-5
I'd suggest to see other "help wanted" issues: https://github.com/pytorch/ignite/issues?q=is%3Aissue+is%3Aopen+label%3A%22help+wanted%22
π Bug description
Below code shows the error:
The issue is that iteration and epoch start to be unrelated which is a bug.
Environment
conda
,pip
, source):