Closed jorshi closed 3 years ago
Potential solution is to use the replacement arg in slugify (we would override the slugify_file_name
method in ExtractMetadata for the dcase task) to replace -
with negative_
. See https://github.com/un33k/python-slugify
For example:
slugify(str(Path(relative_path).stem), replacements=[["-", "negative_"]])
We should also do a sanity check in the ExtractMetadata run function to make sure that all the slugs are unique. i.e. something like:
assert len(process_metadata["relpath"].unique()) == len(process_metadata["slug"].unique())
In the luigi pipeline metadata dataframe we slugify the relative path of the filename. This is broken for filenames that contain a dash character. For example:
test_1_ebr_-6_nec_4_poly_1.wav
in dcase. This slugifies totest-1-ebr-6-nec-4-poly-1
, which is the same as whattest_1_ebr_6_nec_4_poly_1.wav
slugifies to.