lifelong-learning-systems / tella

Framework for Training & Evaluating Lifelong Learning Agents (TELLA)
MIT License
3 stars 2 forks source link

Add way to load curriculum from file #57

Open coreylowman opened 2 years ago

coreylowman commented 2 years ago

This is semi related to the discussion in #38. The CLI will need a way to load in a curriculum from a file. Options are:

  1. Dependency injection approach by dynamically importing an unknown python file
  2. Constructing a curriculum via configuration in yaml
  3. Have a registry of curriculums that can be referred to by string name (a la gym environments)
  4. ???
coreylowman commented 2 years ago

@cash @gkvallabha has issues with loading curriculums from yaml files, you should touch base with him

coreylowman commented 2 years ago

I think the only thing using this would be if we wanted to pass in a path to the curriculum via command line (i.e. the current CLI), right?

We may want to punt on this issue & the cli plan in favor of telling people to import their agent, import the curriculum, and call run_experiment in tella.

cash commented 2 years ago

We can have a registry of current curricula and then can specify the key on the command line. I don't see a reason to abandon the CLI yet.

gkvallabha commented 2 years ago

When we did this with TEF (in L2M Phase 1), we used a data-driven approach (a JSON file for the curriculum). This quickly ran into limitations, e.g.

My takeaway was that a data-driven approach is not scalable and potentially hard to debug/understand.

I understand the security concern of doing dynamic imports, though realistically, users are going to be running they got from a GitHub repo either way. It seems to me that a good alternative is to ask users to set up a short runner script in Python and invoke it (slightly more work for users, but on the flip side, it allows an explicit specification of each "experiment").

Additional point re JSON specification

cash commented 2 years ago

I need more experience with our curriculums to have an opinion here.

I'm more concerned about having an undocumented implicit schema for configuration making validation and creation difficult than security issues with importing an arbitrary python module.

Early in development it can make sense to have the flexibility of a full scripting language for configuration. If after a while there are a smallish number of primitives in the configuration, it can be really useful to codify them as a schema and separate out the data from the code. I don't know if that is the case here.

gkvallabha commented 2 years ago

We don't quite know the full range of (lifelong) curricula. This is a pretty novel area, so we are feeling our way through this space ... I don't think the performers have a good idea either at present.

Another possibility (other than separate data from code) -- the curriculum designer can use some specified APIs (e.g., subclass from an abstract class provide implementations) as building blocks .. .it isn't as strongly constrained as a data schema but can still provide some way to ensure the curriculum is put together in a reasonable way (e.g., like specifying a BNF).