worldveil / dejavu

Audio fingerprinting and recognition in Python
MIT License
6.41k stars 1.43k forks source link

Split big files #75

Closed thesunlover closed 9 years ago

thesunlover commented 9 years ago

added support for huge files check the long_test.py and play with minutes & processes number to fit in the RAM available

Edit: Procedure of the process:

  1. Splits the large audio file to fingerprintable smaller pieces
  2. Fingerprints them one by one.
  3. Saves them as a single audio in the DataBase.
worldveil commented 9 years ago

So to be clear, this splits up large files over a certain length (ie, 3 minutes), and fingerprints each as a separate "song", yes?

thesunlover commented 9 years ago

updated the description of the PR

worldveil commented 9 years ago

Also the binary files (large mp3s) are too large and I'd rather not increase the size of the repo with those.

Can you instead give a public link (could even download with urllib2 in example.py) to some copyright-free music?

thesunlover commented 9 years ago

ok, will do it later in the evening Edit: got rid of the mp3 files and created a modification in the test-file so it generates a new file with the content of the existing mp3 files. modified the test so that it uses the generated file (58minutes file is generated)

thesunlover commented 9 years ago

removed the non-copyright files and added auto-creation of the long file with "ffmpeg concat" added treeremove of the completed long file please review once again

thesunlover commented 9 years ago

do we need to test with other formats like ogg & flv?

worldveil commented 9 years ago

If pydub handles them, I think it should be fine.

Will review this sometime this weekend I hope.

thesunlover commented 9 years ago

I have not thought about using 1 minute limit and splitting the existing files.

If I set the maximum audio length for straight fingerprinting to be 1 minute and use all the available CPU cores it might be faster to fingerprint even short songs. - full CPU usage and less amount of memory usage at the same time.

What do you say, should i test that?

thesunlover commented 9 years ago

in the last 2 commits i moved a few arguments to be properties of Dejavu. Should I move them in the config file?

worldveil commented 9 years ago

will review this week - apologies for the delay!

worldveil commented 9 years ago

OK, I'm pulling this for review right now.

First things first, we need to remove the large binary files from the git history. You removed them from current version, but they are still in history (.git folder is 89MB).

Second, yes the arguments for fingerprinting and splitting large files should be in the config. And we should document those options in README.md as well.

Many thanks for your patience in getting back to you!

thesunlover commented 9 years ago

I think it would be easier for me to start a new branch, copy the changes in there close this pull request and reopen it with the new branch

is that ok?

worldveil commented 9 years ago

sounds fine to me.

worldveil commented 9 years ago

were you able to create a new PR with this? would gladly merge.

thesunlover commented 9 years ago

Hello, worldveil I hope I have enough time in the evening to do it at home

thesunlover commented 9 years ago

reposted the PR in here: https://github.com/worldveil/dejavu/pull/87

thesunlover commented 7 years ago

@worldveil would you please review the new PR #87 and give advise how to properly calculate the offset_seconds for the parts that follow the first one ?

NathanielCustom commented 5 years ago

@thesunlover I left a comment in PR #87 on how I got offset_seconds working.