mimbres / neural-audio-fp

https://mimbres.github.io/neural-audio-fp
MIT License
179 stars 25 forks source link

Speed of generating fingereprints from custom source #23

Closed mimbres closed 2 years ago

mimbres commented 2 years ago

Hi, it might be related to this, but I'm trying to generate fingerprints from custom source using the pretranied model you shared here: https://github.com/mimbres/neural-audio-fp/issues/10#issuecomment-878335408 and I was wondering if you could tell me what's the expectated time for generating a fingerprint from a single query? Since it took 1629seconds to generate fingerprints corresponding to 2 queries (1-min length) [even though in the source directory there are 3 wav files, I'm studying why this as well ] From the CLI Output: 2/2 [==============================] - 1629s 47ms/step

I'm using a 40-cpu server with a RTX3090.

Also, can you help me understanding the shape of the resulting db? I understand that the shape is n_items x d, and n_items is #num audios x batchsize. I don't see what this batchsize mean and therefore, the resulting db shape.

Thanks in advance!

Originally posted by @guillemcortes in https://github.com/mimbres/neural-audio-fp/issues/8#issuecomment-1015670628

mimbres commented 2 years ago

@guillemcortes

  1. Speed: NO, it's weird to see 1629s (~= 27 min) for the 2 x 1 min queries. I can't remember the exact elapse time but it should be processed in <1s.

    • Did you add --skip_dummy tag?
      python run.py generate CHECKPOINT_NAME CHECKPOINT_INDEX -c CONFIG_NAME --source SOURCE_ROOT_DIR --output FP_OUTPUT_DIR --skip_dummy

      Here, --skip_dummy means that we will skip the generating of fingerprints for the 100K dummy songs.

    • BTW, only once for the first experiment, we need to extract fingerprints for the 100K dummy songs along with the custom source. In this case, it may take as long as 27 minutes, as in your case.
      python run.py generate CHECKPOINT_NAME CHECKPOINT_INDEX -c CONFIG_NAME --source SOURCE_ROOT_DIR --output FP_OUTPUT_DIR
    • I will update this answer after reproducing it in my environment tonight.
  2. Shape of the resulting DB:

    • As for n_items x d, n_items means the number of fingerprints. By default, we extract it for each 1s segments with 0.5s hop. Given 1 min x 2 songs (total 120s) for the custom queries, n_items should be 238 = 2 (60 2 - 1). The shape is stored in your logs/emb/xxxx/xxx/query_shape.npy.
    • By default, we use TS_BATCH_SZ : 125. This can be a problem in your case with only 2 songs (239 segments). As 239 % 125 = 114, the last 114 segments will be dropped (#21). Setting TS_BATCH_SZ with any divisible number of your total segments, like 239, can be a temporary solution for this.
mimbres commented 2 years ago

ToDo:

mimbres commented 2 years ago

Now (7647aec) the output filenames are custom_source.mm and `custom_source_shape.npy', because they can be used for both custom DB and query generations.

mimbres commented 2 years ago

db28b6b resolves #21.

>>> np.load('custom_db_shape.npy')
array([238, 128])
guillemcortes commented 2 years ago

Hi, I know you closed this issue, just wanted to update on this. Yes, the results I showed you were using the skip_dummy flag. I tried to generate the fingerprints using CPU only and it's fast, now. Around 12s for 3 queries of 1 minute using TS_BATCH_SIZE of 119. I still have to investigate why it's so slow with GPU (and also only computes the fingerprint of 2/3 available audios) but for the moment I will stick to CPU. Thanks!

mimbres commented 2 years ago

@guillemcortes I don't quite understand how slow it is on the GPU. Have you ever tried training with the default config? 1 epoch (10K songs) usually takes around 20 min. If it takes too long, I think it should relate to installation of environment problems.

guillemcortes commented 2 years ago

Ok! will try training with the default config and let you know!

guillemcortes commented 2 years ago

Hi, I tried reinstalling your docker version and now training from scratch with the default config python run.py train test2 --max_epoch=10 -c default takes around 16min per epoch. Sorry for the noise, I must have had my docker image corrupted somehow. Thanks!