git clone https://github.com/timendum/ciccio.git
pip install -r requirements.txt
data\ok
put what you want to survive (es: speech parts)data\ko
put the rest of the audio (es: ads or music parts)python main.py train
The output will be the model in data\svmSM
folder.
You can split a file with python main.py split <source.mp3>
The program will produce many source_n.mp3
files in the same folder as the original mp3.
The logic and details are in the podcast.py
file.
It will download an mp3 for a specific show, split it and then produce an XML for the podcast.
To allow the processing on smaller machine, the input file is splitted in smaller chunks and every chunk is parsed and analyzed.
Use the BASE_URL
env var to output full paths for the mp3s.
This module is based on
pyAudioAnalysis
by Theodoros Giannakopoulos, under Apache License.
I only removed unused parts, simplified others and automated a little bit more.