tae-jun / sample-cnn

A TensorFlow implementation of "Sample-level Deep Convolutional Neural Networks for Music Auto-tagging Using Raw Waveforms"
MIT License
51 stars 10 forks source link

Filter visualization for sample-cnn #4

Open Jamesswiz opened 6 years ago

Jamesswiz commented 6 years ago

Hey! Thanks for sharing the implementation. I was wondering if you can also share the gradient accent filter visualization method, compatible with your keras implementation, or may be point me to any links.

Looking forward to a response!

tae-jun commented 6 years ago

Hi! Thanks for your interest 😄 The filter visualization code is based on this blog.

Actually, I'm not the first author (Jongpil Lee) of the paper (SampleCNN), but the author thankfully gave me the code and he says it's ok to share. I tested the code before, and it works like a charm.

However, you need few modification with this repository's implementation. The input size of CNN is currently fixed. You should change the size to None so that you can run on variable-size signals.

This is the code I received. Good luck!


# dimensions of the generated pictures for each filter.
sample_length = 729 #512

# step size for gradient ascent
step = 1. #3. 1.
step_num = 18 #1000 18
conv_dense = 'conv' # conv or dense
norm_param_list = [1e-9]
layer_list = ['activation_1','activation_2','activation_3','activation_4','activation_5','activation_6']
nb_filters_list = [128,128,128,256,256,256]

fftsize = 729

# load model
weight_path = 
model.load_weights(weight_path)
model.summary()
print('model loaded!!!')

# save path
save_path = 

# get the symbolic outputs of each "key" layer (we gave them unique names).
layer_dict = dict([(layer.name, layer) for layer in model.layers[1:]])

# this is the placeholder for the input images (None,59049,1)
input_img = model.input

def normalize(x,norm_param):
    # utility function to normalize a tensor by its l2 norm
    return x / (K.sqrt(K.mean(K.square(x))) + norm_param) # -5?

plt.figure()

# norm_param for loop
for norm_param in norm_param_list:
    # all layers for loop
    for iter,layer_name in enumerate(layer_list):
        print(iter,layer_name)

        # save name
        save_name = '%s_norm%s_filters.png' % (layer_name,str(norm_param))
        print(save_name)

        if os.path.isfile(save_path+save_name) == 1:
            print('already calculated:',save_name)
            continue

        nb_filters = nb_filters_list[iter]
        repetition = int((fftsize/2+1)/nb_filters)
        print('repetition:' + str(repetition))

        save_path_wav = 

        fftzed = np.zeros((nb_filters,fftsize/2+1)) 
        for filter_index in range(0,nb_filters):

            # we only scane through the first 10 filters.
            # but there are actually ## of them
            print('Processing filter %d' % filter_index)
            start_time = time.time()

            # we build a loss function that maximizes the activation
            # of the nth filter of the layer considered
            layer_output = layer_dict[layer_name].output
            if conv_dense == 'conv':
                loss = K.mean(layer_output[:,:,filter_index])
            elif conv_dense == 'dense':
                loss = K.mean(layer_output[:,filter_index])

            # we compute the gradient of the input picture wrt this loss
            grads = K.gradients(loss, input_img)[0]

            # normalization trick: we normalize the gradient
            grads = normalize(grads,norm_param)

            # this function returns the loss and grads given the input picture
            iterate = K.function([input_img, K.learning_phase()],[loss,grads])

            # we start from a gray image with some random noise
            input_img_data = np.random.random((1,sample_length,1))
            input_img_data = (input_img_data - 0.5) * 0.03 #1.8

            # we run gradient ascent for 20 steps
            for i in range(step_num):
                loss_value, grads_value = iterate([input_img_data,1]) # 0 test phase
                input_img_data += grads_value * step

                print('Current loss value:', loss_value)
                if loss_value <= 0.:
                    # some filters get stuck to 0, we can skip them
                    break

            end_time = time.time()
            print('Filter %d processed in %ds' % (filter_index, end_time - start_time))

            print(np.squeeze(input_img_data[0]).shape)
            sample = np.squeeze(input_img_data[0])

            # erase DC
            sample = sample - np.mean(sample)

            # save wav figure
            save_name_wav = '%s_filter%d_norm%s.png' % (layer_name, filter_index, str(norm_param))

            if not os.path.exists(os.path.dirname(save_path_wav+save_name_wav)):
                os.makedirs(os.path.dirname(save_path_wav+save_name_wav))

            plt.clf()
            plt.plot(sample)
            plt.axis('off')
            plt.savefig(save_path_wav+save_name_wav)

            # perform squared magnitude spectra
            S = librosa.core.stft(sample,n_fft=fftsize,hop_length=fftsize,win_length=fftsize)
            X = np.square(np.absolute(S))
            log_S = np.log10(1+10*X)
            log_S = np.squeeze(log_S.astype(np.float32))
            print(log_S.shape)
            #log_S = np.mean(log_S,axis=1)
            print(log_S.shape)
            fftzed[filter_index] = log_S
            print(fftzed.shape,repetition)

        argmaxed = np.argmax(fftzed,axis=1)
        sort_idx = np.argsort(argmaxed)
        sorted_fft = fftzed[sort_idx,:]

        sorted_fft = np.repeat(sorted_fft,repetition,axis=0)
        print(sorted_fft.shape)

        if not os.path.exists(os.path.dirname(save_path+save_name)):
            os.makedirs(os.path.dirname(save_path+save_name))

        # save figure
        plt.clf()
        plt.imshow(sorted_fft.T)
        plt.gca().invert_yaxis()
        plt.axis('off')
        plt.savefig(save_path+save_name)

print('save done!!!')
tae-jun commented 6 years ago

Also, here is the implementation of the first author: https://github.com/jongpillee/sampleCNN

And we are working on extended work of the filter visualization 😄

Jamesswiz commented 6 years ago

Hey ! Thanks again for a quick response. I will try your shared implementation. I am also working on CNN filter visualization with raw speech as input. It will be good to start with gradient ascent first.....

May be it will be better to add this code as a txt file attachment......