Closed mimbres closed 2 years ago
@guillemcortes
Speed: NO, it's weird to see 1629s (~= 27 min) for the 2 x 1 min queries. I can't remember the exact elapse time but it should be processed in <1s.
--skip_dummy
tag?
python run.py generate CHECKPOINT_NAME CHECKPOINT_INDEX -c CONFIG_NAME --source SOURCE_ROOT_DIR --output FP_OUTPUT_DIR --skip_dummy
Here, --skip_dummy
means that we will skip the generating of fingerprints for the 100K dummy songs.
python run.py generate CHECKPOINT_NAME CHECKPOINT_INDEX -c CONFIG_NAME --source SOURCE_ROOT_DIR --output FP_OUTPUT_DIR
Shape of the resulting DB:
n_items x d
, n_items
means the number of fingerprints. By default, we extract it for each 1s segments with 0.5s hop. Given 1 min x 2 songs (total 120s) for the custom queries, n_items
should be 238 = 2 (60 2 - 1). The shape is stored in your logs/emb/xxxx/xxx/query_shape.npy
.TS_BATCH_SZ : 125
. This can be a problem in your case with only 2 songs (239 segments). As 239 % 125 = 114
, the last 114 segments will be dropped (#21). Setting TS_BATCH_SZ
with any divisible number of your total segments, like 239, can be a temporary solution for this. I reproduced @guillemcortes experiment using 2 x 1 min custom source. The resulting shape was 125 x 128
(114 segments were dropped with TS_BATCH_SZ=125
) as expected. But I encountered a buggy error message, which is ignorable though.
python run.py generate CHECKPOINT_NAME --source ../neural-audio-fp-dataset/music/others/custom_source --skip_dummy
Arugment 'checkpoint_index' was not specified.
Searching for the latest checkpoint...
---Restored from ./logs/checkpoint/exp_mini640_tau005/ckpt-101---
Data source: dict_keys(['custom_db']) unseen_icassp
=== Generating fingerprint from 'custom_db' bsz=125, 125 items, d=128 ===
1/1 [==============================] - 6s 6s/step
=== Succesfully stored fingerprint to ./logs/emb//exp_mini640_tau005/101/ ===
[ERROR MESSAGE AFTER THIS]
File "run.py", line 111, in generate
generate_fingerprint(cfg, checkpoint_name, checkpoint_index, source, output, skip_dummy)
File "neural-audio-fp-dev/model/generate.py", line 190, in generate_fingerprint
if sz_check['db'] != sz_check['query']:
KeyError: 'db'
Check resulting fingerprint shape.
import numpy as np
>>> np.load('custom_db_shape.npy')
array([125, 128])
ToDo:
custom_source*.*
Now (7647aec) the output filenames are custom_source.mm
and `custom_source_shape.npy', because they can be used for both custom DB and query generations.
db28b6b resolves #21.
>>> np.load('custom_db_shape.npy')
array([238, 128])
Hi, I know you closed this issue, just wanted to update on this. Yes, the results I showed you were using the skip_dummy flag. I tried to generate the fingerprints using CPU only and it's fast, now. Around 12s for 3 queries of 1 minute using TS_BATCH_SIZE of 119. I still have to investigate why it's so slow with GPU (and also only computes the fingerprint of 2/3 available audios) but for the moment I will stick to CPU. Thanks!
@guillemcortes I don't quite understand how slow it is on the GPU. Have you ever tried training with the default config? 1 epoch (10K songs) usually takes around 20 min. If it takes too long, I think it should relate to installation of environment problems.
Ok! will try training with the default config and let you know!
Hi, I tried reinstalling your docker version and now training from scratch with the default config python run.py train test2 --max_epoch=10 -c default
takes around 16min per epoch.
Sorry for the noise, I must have had my docker image corrupted somehow. Thanks!
Hi, it might be related to this, but I'm trying to generate fingerprints from custom source using the pretranied model you shared here: https://github.com/mimbres/neural-audio-fp/issues/10#issuecomment-878335408 and I was wondering if you could tell me what's the expectated time for generating a fingerprint from a single query? Since it took 1629seconds to generate fingerprints corresponding to 2 queries (1-min length) [even though in the source directory there are 3 wav files, I'm studying why this as well ]
From the CLI Output: 2/2 [==============================] - 1629s 47ms/step
I'm using a 40-cpu server with a RTX3090.
Also, can you help me understanding the shape of the resulting db? I understand that the shape is
n_items x d
, andn_items
is#num audios x batchsize
. I don't see what this batchsize mean and therefore, the resulting db shape.Thanks in advance!
Originally posted by @guillemcortes in https://github.com/mimbres/neural-audio-fp/issues/8#issuecomment-1015670628