worldveil / dejavu

Audio fingerprinting and recognition in Python
MIT License
6.36k stars 1.43k forks source link

Fingerprinting a folder with ~10000 tracks consuming 45GB of RAM and it's growing. #261

Open bboymega opened 3 years ago

bboymega commented 3 years ago

Fingerprinting a folder with ~10000 tracks consuming 45GB of RAM and it's growing. It seems that the memory is leaked even when a song has completed recognition.

danijel1124 commented 6 months ago

Have the same issue. Using the given docker project with pgsql. To avoid the error occurring, I made the following adjustment to my code. import os import argparse import datetime from dejavu import Dejavu

# load config from a JSON file (or anything outputting a python dictionary) config = { "database": { "host": "db", "user": "postgres", "password": "password", "database": "dejavu" }, "database_type": "postgres" }

def write_fingerprinted_file(directory, files): """Creates a file with the current date and a list of files.""" with open(os.path.join(directory, 'fingerprinted.txt'), 'w') as f: f.write(f"Fingerprinted on: {datetime.datetime.now()}\n") f.write("Files:\n") for file in files: f.write(f"{file}\n")

if __name__ == '__main__': parser = argparse.ArgumentParser(description="Fingerprint audio files in a directory and its subdirectories.") parser.add_argument("directory", help="The path to the directory containing audio files.") args = parser.parse_args()

` # Supported audio file extensions: mp3, m4a, wav, etc. supported_extensions = [".mp3", ".m4a", ".wav"]

# create a Dejavu instance
djv = Dejavu(config)`

# Navigate through the main directory and each subdirectory for root, dirs, files in os.walk(args.directory): if root != args.directory: # To avoid fingerprinting the parent directory itself print(f"Fingerprinting directory: {root}") djv.fingerprint_directory(root, supported_extensions) write_fingerprinted_file(root, files) the original code comes from the docker example file. I made the first adjustment like this and the RAM usage went towards infinity. `import json import argparse from dejavu import Dejavu

load config from a JSON file (or anything outputting a python dictionary)

config = { "database": { "host": "db", "user": "postgres", "password": "password", "database": "dejavu" }, "database_type": "postgres" }

if name == 'main': parser = argparse.ArgumentParser(description="Fingerprint audio files in a directory.") parser.add_argument("directory", help="The path to the directory containing audio files.") args = parser.parse_args()

# Supported audio file extensions: mp3, m4a, wav, etc.
supported_extensions = [".mp3", ".m4a", ".wav"]

# create a Dejavu instance
djv = Dejavu(config)

# Fingerprint all the supported audio files in the directory
djv.fingerprint_directory(args.directory, supported_extensions)`

With the adjustment above it works somehow, but of course it's still not optimal. The error still occurs, but the memory usage is limited due to the folder structure. This works for my case, but doesn't have to work for other cases.