Open coreylowman opened 2 years ago
@cash @gkvallabha has issues with loading curriculums from yaml files, you should touch base with him
I think the only thing using this would be if we wanted to pass in a path to the curriculum via command line (i.e. the current CLI), right?
We may want to punt on this issue & the cli plan in favor of telling people to import their agent, import the curriculum, and call run_experiment in tella.
We can have a registry of current curricula and then can specify the key on the command line. I don't see a reason to abandon the CLI yet.
When we did this with TEF (in L2M Phase 1), we used a data-driven approach (a JSON file for the curriculum). This quickly ran into limitations, e.g.
My takeaway was that a data-driven approach is not scalable and potentially hard to debug/understand.
I understand the security concern of doing dynamic imports, though realistically, users are going to be running they got from a GitHub repo either way. It seems to me that a good alternative is to ask users to set up a short runner script in Python and invoke it (slightly more work for users, but on the flip side, it allows an explicit specification of each "experiment").
Additional point re JSON specification
learnkit
approach to data-driven curricula, see the JSON files in particular. Note that it specified the task name ($learnkit:sample_classification_tasks.NumberData
) so that the task could be loaded and verified as a valid task, and its parameters could be checked, which in turn involved dynamic loading. This could be avoided by having a master list of tasks somewhere but this doesn't scale well.
@cash I need more experience with our curriculums to have an opinion here.
I'm more concerned about having an undocumented implicit schema for configuration making validation and creation difficult than security issues with importing an arbitrary python module.
Early in development it can make sense to have the flexibility of a full scripting language for configuration. If after a while there are a smallish number of primitives in the configuration, it can be really useful to codify them as a schema and separate out the data from the code. I don't know if that is the case here.
We don't quite know the full range of (lifelong) curricula. This is a pretty novel area, so we are feeling our way through this space ... I don't think the performers have a good idea either at present.
Another possibility (other than separate data from code) -- the curriculum designer can use some specified APIs (e.g., subclass from an abstract class provide implementations) as building blocks .. .it isn't as strongly constrained as a data schema but can still provide some way to ensure the curriculum is put together in a reasonable way (e.g., like specifying a BNF).
This is semi related to the discussion in #38. The CLI will need a way to load in a curriculum from a file. Options are: