zhr1201 / deep-clustering

A tensorflow implementation for Deep clustering: Discriminative embeddings for segmentation and separation
135 stars 70 forks source link

I get a vary high loss #19

Closed sTarAnna closed 5 years ago

sTarAnna commented 5 years ago

Thanks for your great job.I rewrite you program to make it run with tensorflow1.12 and python 3. I only use 2BLSTM not 4.At the begining of train stage ,the train loss is 3000k, after 66k steps ,the train loss is 1000k . I want to know thats correct or not.

dc model ` with tf.variable_scope('BLSTM1') as scope:

        layer1_fw = tf.nn.rnn_cell.LSTMCell(self.n_hidden)
        layer1_bw = tf.nn.rnn_cell.LSTMCell(self.n_hidden)

        #dropout
        layer1_fw_dropout = tf.nn.rnn_cell.DropoutWrapper(layer1_fw,self.p_keep_fw)
        layer1_bw_dropout = tf.nn.rnn_cell.DropoutWrapper(layer1_bw,self.p_keep_fw)

        #layer1 outputs
        layer1_outputs,_ = tf.nn.bidirectional_dynamic_rnn(layer1_fw_dropout,layer1_bw_dropout,x,sequence_length=[FRAMES_PER_SAMPLE] * self.batch_size,dtype=tf.float32)

        #拼接两个输出
        layer1_output = tf.concat(layer1_outputs,2)
        #第一层结束

    #第二层BLSTM
    with tf.variable_scope('BLSTM2') as scope:
        layer2_fw = tf.nn.rnn_cell.LSTMCell(self.n_hidden)
        layer2_bw = tf.nn.rnn_cell.LSTMCell(self.n_hidden)

        # dropout
        layer2_fw_dropout = tf.nn.rnn_cell.DropoutWrapper(layer2_fw, self.p_keep_fw)
        layer2_bw_dropout = tf.nn.rnn_cell.DropoutWrapper(layer2_bw, self.p_keep_fw)

        # layer1 output
        layer2_outputs, _ = tf.nn.bidirectional_dynamic_rnn(layer2_fw_dropout, layer2_bw_dropout, layer1_output,
                                                            sequence_length=[FRAMES_PER_SAMPLE] * self.batch_size,
                                                            dtype=tf.float32)

        # 拼接两个输出
        layer2_output = tf.concat(layer2_outputs, 2)
        # 第二层结束

    #feedfoward layer
    with tf.variable_scope('feedfoward') as scope:
        blstm_output = tf.reshape(layer2_output, [-1, self.n_hidden * 2])
        emb_out = tf.matmul(blstm_output,self.weights['out']) + self.biases['out']
        #tanh激活函数
        emb_out = tf.nn.tanh(emb_out)
        reshaped_emb = tf.reshape(emb_out, [-1, NEFF, EMBBEDDING_D])
        # #L2正则化
        normalized_emb = tf.nn.l2_normalize(reshaped_emb, 2)
    return normalized_emb`
njusq commented 5 years ago

I ran the program(4 BiLSTM) with tensorflow1.12 and python3.6, too. The mixture can be separated successfully although the loss value seems to be high. Have you ever checked it with audio_test.py ?

sTarAnna commented 5 years ago

I ran the program(4 BiLSTM) with tensorflow1.12 and python3.6, too. The mixture can be separated successfully although the loss value seems to be high. Have you ever checked it with audio_test.py ?

I rewrite my model and it now have 4blstm layers .I found that the loss value become lower and convergence faster .After training I would use audio_test to see if it works . Thanks for your reply!

zhr1201 commented 5 years ago

Thanks for your reply! Btw @njusq我也是nju的,校友你好啊

On 04/18/2019 14:48, njusq wrote:

I ran the program(4 BiLSTM) with tensorflow1.12 and python3.6, too. The mixture can be separated successfully although the loss value seems to be high. Have you ever checked it with audio_test.py ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

sTarAnna commented 5 years ago

I ran the program(4 BiLSTM) with tensorflow1.12 and python3.6, too. The mixture can be separated successfully although the loss value seems to be high. Have you ever checked it with audio_test.py ?

I got a problem that the separated speech file (.wav) cant play . Could you please tell me how did solve this problem ? Thank you a lot!

JayYang-Fdu commented 5 years ago

I ran the program(4 BiLSTM) with tensorflow1.12 and python3.6, too. The mixture can be separated successfully although the loss value seems to be high. Have you ever checked it with audio_test.py ?

I got a problem that the separated speech file (.wav) cant play . Could you please tell me how did solve this problem ? Thank you a lot!

you can change the player .

njusq commented 5 years ago

Thanks for your reply! Btw @njusq我也是nju的,校友你好啊

On 04/18/2019 14:48, njusq wrote:

I ran the program(4 BiLSTM) with tensorflow1.12 and python3.6, too. The mixture can be separated successfully although the loss value seems to be high. Have you ever checked it with audio_test.py ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

啊啊啊鸡冻!好像师兄也是Acoustics的!老师让我先通过复现这个项目来学习相关的知识,从零开始的小白觉得好艰难TAT,只好到GitHub上看看有没有现成的...然后我现在在用这个框架,改掉loss的计算方式做成DANet的,碰到了许多问题,可以向师兄请教吗?

njusq commented 5 years ago

I ran the program(4 BiLSTM) with tensorflow1.12 and python3.6, too. The mixture can be separated successfully although the loss value seems to be high. Have you ever checked it with audio_test.py ?

I got a problem that the separated speech file (.wav) cant play . Could you please tell me how did solve this problem ? Thank you a lot!

I once encountered the same problem opening the file through Windows Media Player. I guess something related the format of the file leads to this problem. Maybe you can try to open the file through Adobe Audition or under your Linux system.

sTarAnna commented 5 years ago

I ran the program(4 BiLSTM) with tensorflow1.12 and python3.6, too. The mixture can be separated successfully although the loss value seems to be high. Have you ever checked it with audio_test.py ? I got a problem that the separated speech file (.wav) cant play . Could you please tell me how did solve this problem ? Thank you a lot!

I once encountered the same problem opening the file through Windows Media Player. I guess something related the format of the file leads to this problem. Maybe you can try to open the file through Adobe Audition or under your Linux system.

可以,解决了,给力。谢谢。

sTarAnna commented 5 years ago

I ran the program(4 BiLSTM) with tensorflow1.12 and python3.6, too. The mixture can be separated successfully although the loss value seems to be high. Have you ever checked it with audio_test.py ? I got a problem that the separated speech file (.wav) cant play . Could you please tell me how did solve this problem ? Thank you a lot!

you can change the player .

yeah. I open it under linux os and it has no problem .Thanks a lot

sTarAnna commented 5 years ago

I ran the program(4 BiLSTM) with tensorflow1.12 and python3.6, too. The mixture can be separated successfully although the loss value seems to be high. Have you ever checked it with audio_test.py ? I got a problem that the separated speech file (.wav) cant play . Could you please tell me how did solve this problem ? Thank you a lot! you can change the player . yeah. I open it under linux os and it has no problem .Thanks a lot 师兄你好呀,我能加一下你的微信吗?我感觉自己快要毕不了业了。。。我自己之前想过要换神经网络,但是不知道为什么会出现矩阵维度不匹配,百度也找不到。。。这个不知道是怎么解决的呀!感谢师兄的回复!

你想换成什么神经网络0 0

zhr1201 commented 5 years ago

@njusq 好啊好啊,师弟是卢晶老师组里的吗?

zhr1201 commented 5 years ago

@Yangjie55 haoran7123微信 多多交流

zhr1201 commented 5 years ago

@sTarAnna O_O O_o o_o o_O O_O

sTarAnna commented 5 years ago

@sTarAnna O_O O_o o_o o_O O_O

怎么了大兄弟0 0

JayYang-Fdu commented 5 years ago

@sTarAnna O_O O_o o_o o_O O_O

怎么了大兄弟0 0

突然变成大兄弟哈哈哈,我是想换成crnn的网络,可是我写的时候,总会在loss的时候,说矩阵元素相乘的时候,维度是不匹配的。我就有点蒙。。。师兄你有实现crnn的网络嘛?

JayYang-Fdu commented 5 years ago

@njusq 好啊好啊,师弟是卢晶老师组里的吗?

真羡慕师兄你们哦,我这本科毕设都搞得要头秃了哈哈哈。。。还不知道研究生可咋办哈哈。。也是自己太菜了,难受了难受了。。。

JayYang-Fdu commented 5 years ago

@sTarAnna O_O O_o o_o o_O O_O

怎么了大兄弟0 0

然后就是论文里面的公式和那些参数,讲实话,我又看了一遍,还是没有看懂什么意思。。。很尴尬。。师兄你有联系方式吗?

sTarAnna commented 5 years ago

@sTarAnna O_O O_o o_o o_O O_O

怎么了大兄弟0 0

突然变成大兄弟哈哈哈,我是想换成crnn的网络,可是我写的时候,总会在loss的时候,说矩阵元素相乘的时候,维度是不匹配的。我就有点蒙。。。师兄你有实现crnn的网络嘛?

我也是做毕设的0 0 我没实现CRNN ,我按照18年的GCDC实现了GCN做DC的

JayYang-Fdu commented 5 years ago

@sTarAnna O_O O_o o_o o_O O_O

怎么了大兄弟0 0

突然变成大兄弟哈哈哈,我是想换成crnn的网络,可是我写的时候,总会在loss的时候,说矩阵元素相乘的时候,维度是不匹配的。我就有点蒙。。。师兄你有实现crnn的网络嘛?

我也是做毕设的0 0 我没实现CRNN ,我按照18年的GCDC实现了GCN做DC的

那你超级厉害呀,我感觉很多东西理解不了。。。唉。。心慌慌。我问你几个问题阔以 不: 1.感觉周浩然师兄在神经网络后面没有加feedforwordlayer,原文中好像有哦(可能是我没看懂),加上师兄用的这个layernormbaisccell和basiccell的神经单元我不知道有什么区别耶。 2. image这个应该是程序里面的loss的计算方式,可是里面的D的-二分之一,那里我看不懂。。。 3.还有原文中的这句话:,and the embedding can be considered a permutation-and cardinality-independent encoding of the network’s estimate of the signal partition.所以这个embedding到底是什么。。我到现在都没理解。 4.还有原文中的V = fθ(x) ∈RN×K,这个我地方我没看懂唉,还请指教,感谢!

JayYang-Fdu commented 5 years ago

@sTarAnna O_O O_o o_o o_O O_O

怎么了大兄弟0 0

突然变成大兄弟哈哈哈,我是想换成crnn的网络,可是我写的时候,总会在loss的时候,说矩阵元素相乘的时候,维度是不匹配的。我就有点蒙。。。师兄你有实现crnn的网络嘛?

我也是做毕设的0 0 我没实现CRNN ,我按照18年的GCDC实现了GCN做DC的

我能借鉴一下你的model的code嘛?(ps:我有点不要脸哈哈哈哈) 你实现的这个方法我听都没听过。。。膜拜一下大佬!!

njusq commented 5 years ago

我能借鉴一下你的model的code嘛?(ps:我有点不要脸哈哈哈哈) 你实现的这个方法我听都没听过。。。膜拜一下大佬!!

@Yangjie55 也在学习过程中的我和你交流吧...wechat(去掉#号): s#q#i#a#n#9#6#0#7#2#4

sTarAnna commented 5 years ago

@sTarAnna O_O O_o o_o o_O O_O 怎么了大兄弟0 0 突然变成大兄弟哈哈哈,我是想换成crnn的网络,可是我写的时候,总会在loss的时候,说矩阵元素相乘的时候,维度是不匹配的。我就有点蒙。。。师兄你有实现crnn的网络嘛? 我也是做毕设的0 0 我没实现CRNN ,我按照18年的GCDC实现了GCN做DC的

那你超级厉害呀,我感觉很多东西理解不了。。。唉。。心慌慌。我问你几个问题阔以 不: 1.感觉周浩然师兄在神经网络后面没有加feedforwordlayer,原文中好像有哦(可能是我没看懂),加上师兄用的这个layernormbaisccell和basiccell的神经单元我不知道有什么区别耶。 2. image这个应该是程序里面的loss的计算方式,可是里面的D的-二分之一,那里我看不懂。。。 3.还有原文中的这句话:,and the embedding can be considered a permutation-and cardinality-independent encoding of the network’s estimate of the signal partition.所以这个embedding到底是什么。。我到现在都没理解。 4.还有原文中的V = fθ(x) ∈RN×K,这个我地方我没看懂唉,还请指教,感谢!

1.加了 `# one layer of embedding output with tanh activation function out_concate = tf.reshape(state_concate4, [-1, self.n_hidden * 2]) emb_out = tf.matmul(out_concate, self.weights['out']) + self.biases['out'] emb_out = tf.nn.tanh(emb_out) reshaped_emb = tf.reshape(emb_out, [-1, NEFF, EMBBEDDING_D])

normalization before output

    normalized_emb = tf.nn.l2_normalize(reshaped_emb, 2)
    return normalized_emb`

2.你看的不是正式版吧0 0正式版公式不长这样 3.DC通过亲和矩阵,实现了置换无关的方法。引用GCDC内的一句话,网络让相同源的向量平行,否则正交。表现在矩阵里就是相同源的值比较大,不同源的比较小。构建亲和矩阵解决了置换问题(另一种可用的是PIT)。This implies that this objective function encourages the mapped embedding vectors to become parallel if they are dominated by the same source and become orthogonal otherwise. 4.R N* K 就是V是一个N* K矩阵 5.GCDC : https://ieeexplore.ieee.org/document/8461746

JayYang-Fdu commented 5 years ago

....不是正版吗...可是我看的就是周师兄在github上贴的那个呀...感觉你好厉害呀...在本科就被拉开了...苦笑中...介不介意加个微信聊呀...带我走进语音处理中..😁

sTarAnna commented 5 years ago

....不是正版吗...可是我看的就是周师兄在github上贴的那个呀...感觉你好厉害呀...在本科就被拉开了...苦笑中...介不介意加个微信聊呀...带我走进语音处理中..😁

1.这是DC正式发表的论文:https://ieeexplore.ieee.org/document/7471631 这篇论文的结构是2层BLSTM。

  1. 这里https://www.isca-speech.org/archive/Interspeech_2016/搜 Single-channel multi-speaker separation using deep clustering ,这篇论文是探讨了4层BLSTM以及更宽的网络的0 0。里面还有对K-means的改进和End2End2的讨论。
  2. 信号处理那部分我自己也很懵逼0 0
JayYang-Fdu commented 5 years ago

....不是正版吗...可是我看的就是周师兄在github上贴的那个呀...感觉你好厉害呀...在本科就被拉开了...苦笑中...介不介意加个微信聊呀...带我走进语音处理中..grin

1.这是DC正式发表的论文:https://ieeexplore.ieee.org/document/7471631 这篇论文的结构是2层BLSTM。

  1. 这里https://www.isca-speech.org/archive/Interspeech_2016/搜 Single-channel multi-speaker separation using deep clustering ,这篇论文是探讨了4层BLSTM以及更宽的网络的0 0。里面还有对K-means的改进和End2End2的讨论。
  2. 信号处理那部分我自己也很懵逼0 0

好的好的 感谢大兄弟啦 那我把这些文章读了先! 再一次感谢!!!