read-only scenario opening

gidden commented 2 years ago

I regularly run into issues around scenarios being locked when they are purely used for data reading. This can be true any time a scenario is needed for cloning, for example. In my use cases, this locking is never related to actual read safety, but rather because some process that had read it failed for some reason (ran out of memory, was killed externally, etc.). These occurrences happen daily on HPG901 in my experience.

It sounds like other users are also running into this issue regularly, so would suggest making this a high priority.

The simplest way around this is to place a read-only status on scenario instances. These would not lock the scenario, but also would not allow changes. They would therefore be robust against unexpected system exits.

cc @adrivinca @khaeru @LauWien @meksor @danielhuppmann

khaeru commented 2 years ago

I would suggest this is a strict duplicate of #30, as any fix there will also allow this.

gidden commented 2 years ago

I would argue

ixmp.Scenario(..., read_only=True)

is different than

try: 
    ixmp.Scenario()
except error:
    ixmp.unlock_scenario()
    ixmp.Scenario()

but am happy for the issue to be closed as you see fit @khaeru !

khaeru commented 2 years ago

ixmp.Scenario(..., read_only=True)

To summarize the PRs linked from https://github.com/iiasa/ixmp/issues/30#issuecomment-1044195080

Instantiating a Python Scenario/TimeSeries object on a JDBCBackend requires instantiating a corresponding Java object instance—regardless of whether it's a new Scenario, accessing existing data only, or modifying data.
The routines that instantiate the Java object are in ixmp_source.
Those routines are the cause of the problem described #30 and here: a. They leave the underlying database in an “unclean” state after an unexpected exit. b. They do not offer a way to recover from that state, either automatically or manually.

So we could as suggested store a new, read_only state variable on the Python Scenario object, and use that to e.g. throw an exception if the user tries to s.add_par()—totally doable.¹ But this would not provide a workaround for the described problem as hoped: because, if someone has triggered 3(a), then (1) and (2) must still happen regardless of whether read_only=True or read_only=False. We unavoidably run into 3(b).

¹ See also “Versioning and locking” in the (old) ECE GitHub wiki (this page was not migrated when the ECE MediaWiki was set up). Different notions of state (locked, unlocked; checked out, checked out for "timeseries only" editing; now read_only) are already quite jumbled, and adding new ones worsens the problem.

LauWien commented 2 years ago

See also “Versioning and locking” in the (old) ECE GitHub wiki (this page was not migrated when the ECE MediaWiki was set up).

This is now migrated, please see here :)

meksor commented 2 years ago

It seems to me that the underlying problem here is that scenarios are erroneously locked. Using a context manager for locking scenarios would probably mitigate most of these wrong locks, what do you think?

Edit: Nevermind, I just tested context manager behaviour and it seems not even SIGTERM results in the context manager exiting correctly, my bad.

danielhuppmann commented 2 years ago

If the context manager is part of the process on a local machine that looses connection to the database, it cannot change the "lock state" or initiate a roll-back for a run in the database.

Alternatives:

connect to the database via a client-server-API, where the server-side process unlocks when the connection is lost (this is currently not possible for MESSAGE-type data, but will be supported by the new ixmp-base package)
have a nightly job that unlocks all runs that have been locked for more than x hours (this would require an "unlock"-feature in the ixmp-legacy Java implementation - maybe that exists..)

meksor commented 2 years ago

That's true, I guess we would like to know what the most common problem here is...

Loosing connection to the database
Getting shut down externally (if so, what interrupt signal is being sent by the OS? On SIGTERM maybe atexit suffices)
Out of Memory Errors or similar

If we were to settle on a database system that problem could probably be solved without building custom client/server infrastructure by using temporary tables for example, ensuring atomic operations (on top of atomic transactions in the database itself).

khaeru commented 2 years ago

This is now migrated, please see here :)

Thanks! In the old/GitHub wiki, I replaced the migrated content with a link to the new location, so there is no confusion.

ensuring atomic operations

Agreed, this is what was described on the linked page (final line).

LauWien commented 2 years ago

Thanks! In the old/GitHub wiki, I replaced the migrated content with a link to the new location, so there is no confusion.

Thank you! Did the same here and here.

iiasa / ixmp

read-only scenario opening #437