insitro / redun

Yet another redundant workflow engine
https://insitro.github.io/redun/
Apache License 2.0
510 stars 43 forks source link

please document required setup for scheduling tasks from python code #53

Open hamr opened 2 years ago

hamr commented 2 years ago

The documentation helpfully provides examples of how to instantiate a Scheduler and run tasks, e.g.

scheduler = Scheduler()
result = scheduler.run(main())

However, that does not appear to take advantage of caching -- tasks run every time -- so it's not quite analogous with running something like

client = RedunClient()
client.execute(["redun", "run", "tasks.py", "main"])

I'm guessing that's because scheduler object isn't making use of the database. And I expect it's relevant that I'm seeing this message when calling scheduler.run().

INFO     redun:__init__.py:1199 Upgrading db from version -1.0 to 3.1...

How do you set up a scheduler object so that it behaves more like calling redun on the command line? Or do you recommend using RedunClient instead? And could you please add that to the docs?

mattrasmus commented 2 years ago

Hi @hamr thanks for posting this question. It does appear we haven't fully documented this case. We'll add that. In the meantime here is how you configure the Scheduler to use a persistent database (e.g. sqlite). By default, Scheduler() will use an in-memory database that will not persist the cache between executions (which is what you are seeing).

from redun import Scheduler
from redun.config import Config

scheduler = Scheduler(config=Config({
    "backend": {
        "db_uri": "sqlite:///redun.db",
    }
}))
scheduler.load()  # Auto-creates the redun.db file as needed and starts a db connection.
result = scheduler.run(main())

In our own code, where we embed a redun Scheduler inside a larger python application we instantiate Scheduler like above. RedunClient() is really only used in tests.

Let me know if that helps.

hamr commented 2 years ago

Excellent, thank you!