Open chrishokamp opened 9 years ago
we seem to be moving towards using representation generators for everything. the disadvantage here is that the user must then call create_contexts after generating their representations. With the parser approach they could go directly from data --> context objects.
Going directly from data to context objects is possible only if we don't need any additional representations. But we can create parsers to handle such scenario as well.
we can imagine a usecase where a user just wants to use a feature extractor on a dataset and get the features dumped back out. what is the simplest way for them to specify this in the config?
It can be specified in "datasets" in the same way as representation generators are specified now.
The main thing is then handle that in the code as well: if parsers go directly to context objects, there should be no representation generators applied to the output of parsers and no calling of create_contexts function.
yeah i think run_experiment really only handles one usecase right now. It may be easier to create more scripts like 'extract_features' instead of trying to handle every possible usecase inside one script.
On Fri, Feb 20, 2015 at 1:52 PM, varvara-l notifications@github.com wrote:
It can be specified in "datasets" in the same way as representation generators are specified now.
The main thing is then handle that in the code as well: if parsers go directly to context objects, there should be no representation generators applied to the output of parsers and no calling of create_contexts function.
— Reply to this email directly or view it on GitHub https://github.com/qe-team/marmot/issues/13#issuecomment-75240604.
right now, some of our parsers return context objects, and some of them return the filenames of implicitly whitespace-tokenized files. All parsers should take filenames as input, and return lists of lists as output.
It should be the responsibility of the representation generator to (1) generate the representation (2) persist or not persist the representation. The only job of a parser is to read some file format, and to return an object with { 'key': [[seq1_item1, seq1_item2, ...]]}