Open sammlapp opened 8 months ago
maybe instead of happening automatically during .train(), we can implement results_dictionary=CNN.profile(samples)
so that the user can decide when to profile and can easily add it to a notebook/script
here's an example of some profiling
from opensoundscape.sample import AudioSample
# profile preprocessing of one sample, check the amount of time taken by each preprocessing step
m.preprocessor.pipeline.overlay.set(overlay_prob=1)
s = AudioSample.from_series(train_df.iloc[0])
s.labels = s.labels.astype(bool)
m.preprocessor.pipeline.overlay.overlay_df = (
train_df.sample(3000).astype(bool).reset_index()
)
s = m.preprocessor.forward(s, profile=True)
s.runtime
# preprocess a bunch of samples in batches with a dataloader, n_workers>1
dl = m._init_train_dataloader(
train_df.sample(3000).astype(bool),
batch_size=32,
num_workers=16,
raise_errors=True,
)
ds = dl.dataset.dataset
from time import time as timer
from tqdm.autonotebook import tqdm
t0 = timer()
batch_times = []
for i, batch in enumerate(tqdm(dl)):
if i >= 40:
break
batch_times.append(timer() - t0)
t0 = timer()
print(
f"batch loading time: mean {np.mean(batch_times):.02f} max {np.max(batch_times):.02f}"
)
# dataset, but no batching or parallelization with dataloader
t0 = timer()
prep_times = []
d = ds.sample(n=100)
d.label_df = d.label_df.astype(bool)
for s in tqdm(d):
prep_times.append(timer() - t0)
t0 = timer()
print(
f"sample loading time: mean {np.mean(prep_times):.02f} max {np.max(batch_times):.02f}"
)
could also profile the forward and backward pass speed of the network
CNN or Preprocessor could offer a way to profile the speed of