klay-music / klay-beam

Our Apache Beam Transforms and Pipelines
0 stars 0 forks source link

Job stem classifier #48

Closed mxkrn closed 10 months ago

mxkrn commented 10 months ago

This PR contains the new stem classifier job which is used to classify glucose-karaoke files into the stem groups defined by source separation.


TODO

mxkrn commented 10 months ago

@CharlesHolbrow As mentioned in a comment response, I would agree that this isn't parallelizable unless the additional file matching is built. I don't think that's worth our time right now.

This job is supposed to be reusable but, similar to source separation, I doubt we'll be re-using it very often. I imagine whenver we want to ingest new glucose-karaoke splits we'll want to re-use this job. I guess the main thing that's missing for it to be fully automatic is that the process for generating the stems_dict.json is currently done offline. This can be done in an online manner, it would just require a bit more engineering.

CharlesHolbrow commented 10 months ago

Yea, I think it's find to compute the stems dictionary offline for now if it helps things go quicker.

CharlesHolbrow commented 10 months ago

No Dockerfile is needed, because we're just running this locally.

I'm moving job-specific .gitignore lines into job directories. This means that the job package directories are portable–that is, we can copy them into a different repository or sub-dir in the future, and the .gitignored files will still be ignored.