Open alxlampe opened 4 years ago
@alxlampe thanks for reporting. I think the issue is related to https://github.com/pytorch/ignite/issues/1259 and
if you set max_epochs=None
between runs you can restart the engine:
engine = Engine(_run_iteration)
# run one epoch
engine.run(itertools.count(start=0), max_epochs=engine.state.epoch + 1)
# epoch length is None after run:
print(engine.state)
# running another epoch throws the error:
engine.state.max_epochs = None
engine.run(itertools.count(start=0), max_epochs=engine.state.epoch + 1)
I agree that the issue we see here is a bug and we should handle it in a user friendly way.
I was wondering why you have to restart the engine like that ?
Is it possible to use engine.terminate_epoch
and execute "some calculations outside the engines framework" in a handler attached to TERMINATE_SINGLE_EPOCH
event ?
@vfdev-5 Thanks for your fast reply and your proposal.
First to answer your questions.
I was wondering why you have to restart the engine like that ?
I am using nested engines. One engine drives the execution of multiple engines (tasks). In my case, data
that is passed to my process function is a list of tasks.
Some details:
Events.EPOCH_COMPLETED
. TaskSummary
object when epoch is completed.TaskSummary
objects and merges them to a new result when epoch is completed.In the optimal case, Engine
would provide me some interface to split the run
method into subcommands, e.g.
intialize_epoch
runs everything until epoch would be executed (excluding Events.EPOCH_STARTED
)run_single_epoch
runs a single epoch, fires all events in between and including Events.EPOCH_STARTED
and Events.EPOCH_COMPLETED
.
max_epochs
is reached, engine runs to the end (firing Events.COMPLETED
etc.)run_epochs
as an alternative for run_single_epoch
: runs all epochs and breaks if terminate_epoch
is called but as above, only runs to the end if max_epochs
is reached.I was already thinking about creating a new issue to ask if it is possible to separate the event handler mechanics from the engine and build the engine on top it. That would allow to create fully customized engines, e.g. engines that only require an iteration loop but no epoch loop. But I still have to dive deeper into the code to check if this makes any sense and if so, would create an issue with an initial proposal :)
Is it possible to use engine.terminate_epoch and execute "some calculations outside the engines framework" in a handler attached to TERMINATE_SINGLE_EPOCH event ?
In my case, I don't see the benefit of using TERMINATE_SINGLE_EPOCH
instead of EPOCH_COMPLETED
, because it doesn't matter, if engine.terminate_epoch
was called or if the epoch ends without this call. I just recently switched to a newer version of ignite and the event wasn't present before. Perhaps I am missing something.
Coming back to your proposal to set engine.state.max_epochs = None
before calling engine.run
again. It's working but resets the epoch counter which I wanted to use for counting how often the task was executed.
When I set engine.state.max_epochs = None
I have to use max_epochs=1
for each call of engine.run
otherwise for each run, the engine runs 1 epoch at the first call, 2 epochs at the second call, 3 epochs at the third cann etc.
See example below:
engine = Engine(_run_iteration)
engine.state.persistent_across_runs = 0
# run one epoch
engine.run(itertools.count(start=0), max_epochs=engine.state.epoch + 1)
# outputs "1"
print(engine.state.persistent_across_runs)
# run another epoch
engine.state.max_epochs = None
engine.run(itertools.count(start=0), max_epochs=engine.state.epoch + 1)
# outputs "3" since engine runs 2 epochs in the second run
print(engine.state.persistent_across_runs)
Since I am subclassing Engine
I can simply override the _is_done
method to only be taken into account if epoch_length
is set.
@alxlampe thanks for the details !
Yes, that's true that engine.state.max_epochs = None
is to restart the Engine from the begining and not continue. Currently, if we would like to continue, epoch_length
is sort of implicitly required. It is a bug as in your case user can stop the execution during the first epoch of a run with epoch_length=None
and epoch_length
is not set...
Let me think what can be done for that. Maybe, a solution is to set unknown epoch_length
on engine.terminate_epoch()
or engine.terminate()
...
I was already thinking about creating a new issue to ask if it is possible to separate the event handler mechanics from the engine and build the engine on top it. That would allow to create fully customized engines, e.g. engines that only require an iteration loop but no epoch loop. But I still have to dive deeper into the code to check if this makes any sense and if so, would create an issue with an initial proposal :)
Yes, this is something to explore
In the optimal case, Engine would provide me some interface to split the run method into subcommands ...
A refactor like that may also make sense... Let's see what is more interesting: this point or above one.
@fco-dv could you please take a look at this issue and propose a way to fix it along the way with https://github.com/pytorch/ignite/pull/1333 which actually shouldn't introduce state.restart
as it was discussed in the corresponding issue.
@fco-dv any thoughts/updates on this ?
🐛 Bug description
I am using an engine such that it runs a previously undefined number of iterations with an infinite iterator. Epoch will be terminated in the process function with
engine.terminate()
.I am always running the engine for only one epoch, because I need to do some calculations outside the engines framework and then continue the engine to run one further epoch with
engine.run(itertools.counter(start=0), max_epochs=engine.state.epoch + 1)
.I get an error in
engine._is_done
becauseengine.state.epoch_length
isNone
. Is this the intended behavior? Do I have to setepoch_length
explicitly?Here is a minimum example to reproduce the error:
Thanks in advance!
Environment
conda
,pip
, source): pip