Closed mxkrn closed 10 months ago
@CharlesHolbrow As mentioned in a comment response, I would agree that this isn't parallelizable unless the additional file matching is built. I don't think that's worth our time right now.
This job is supposed to be reusable but, similar to source separation, I doubt we'll be re-using it very often. I imagine whenver we want to ingest new glucose-karaoke
splits we'll want to re-use this job. I guess the main thing that's missing for it to be fully automatic is that the process for generating the stems_dict.json
is currently done offline. This can be done in an online manner, it would just require a bit more engineering.
Yea, I think it's find to compute the stems dictionary offline for now if it helps things go quicker.
No Dockerfile is needed, because we're just running this locally.
I'm moving job-specific .gitignore
lines into job directories. This means that the job package directories are portable–that is, we can copy them into a different repository or sub-dir in the future, and the .gitignored files will still be ignored.
This PR contains the new stem classifier job which is used to classify
glucose-karaoke
files into the stem groups defined by source separation.ClassifyAudioStem
is aDoFn
that accepts aReadableFile
and identifies whichStemGroup
it belongs totests/test_transforms.py
copy_file
first checks if a file with that suffix exists, if it does it takes the latest stem enumeration, increments it, and writes a file with that new incremented suffix i.e.other
->other-1
->other-2
.TODO
SkipCompleted
which checks if a track directory exists. The originalSkipCompleted
doesn't work because we're dynamically updating the suffix based on the classified stem group. Since we're also incrementing based on a file existing, one option we have left is to check the existence of the track directories. The files for the tracks that do not yet exist are put up for classification. The only assumption we're making here is that when a track directory exists, it's complete.?
and*
are problematic for parsing the filenames. We need to strip these out of the track name before writing.Dockerfile