Closed XapaJIaMnu closed 1 year ago
How did you generate that json? The schema is described in the FilterPipeline
class, and that says List[str]
. I don't understand how you'd get a list in a list for files
:
class FilterPipeline(BaseModel):
version: Literal[1]
files: List[str]
filters: List[FilterStep]
The only place files
is populated is this bit:
def make_pipeline(name, filters=[]):
columns = list_datasets(DATA_PATH)[name]
return FilterPipeline(
version=1,
files=[file.name
for _, file in
sorted(columns.items(), key=lambda pair: pair[0])
],
filters=filters
)
I just exported it via export json on the gui
Oooh you're right! That's a UI bug. Should be fixed now.
Example generated json:
When doing
./run.py filters.yaml -b data/train-parts/
I get:The issue is that the datasets are packed in a double array, but only single unpacking is done.
[['ELRC-3056-wikipedia_health-v1.en-zh.en.gz', 'ELRC-3056-wikipedia_health-v1.en-zh.zh.gz']]
That's fairly easy to fix, but not sure what is the desired behavior, hence opening the bug report.