locuslab / TCN

Sequence modeling benchmarks and temporal convolutional networks
https://github.com/locuslab/TCN
MIT License
4.08k stars 876 forks source link

Mnist classification problem #52

Open lonelygoatherd opened 4 years ago

lonelygoatherd commented 4 years ago

Hi, I want to ask a simple problem about the mnist classification example. Images in mnist are treated as sequences by expanding them to 1D, then each image gets a probability distribution. But images have no relations, so what is the meaning of tcn here when you process each sequence separately. It looks like a fully connected layer. Did I misunderstand the process procedure?

jerrybai1995 commented 4 years ago

Oh, we flatten each image to 1D. For example, a 28x28 image is converted to a 784x1 sequence (i.e., length 784). So each "time step" to the TCN is essentially a single pixel, not an image.

lonelygoatherd commented 4 years ago

Thanks so much for your reply. "each "time step" to the TCN is essentially a single pixel", but  the flattened 1D data is operated synchronous with a shared kernel, cuz in the same layer the output of each step is not used, is that right? And the  output of each "time step" is a scalar which is used only as the next layer's input, i.e. the hidden layer's input is the previous layer's output and has no direct relation with the origin input X? Furthermore, what is the purpose of permuting the input tensor several times in the begining? And why we need different partition strategies? When we have three partitioned sets, which are not linked by edges, how to process them? Sorry for so many questions. These details confused me for a long time, and no explaination can be found online, and the codes are really hard for me now. Appreciate.

------------------ 原始邮件 ------------------ 发件人: "Shaojie Bai"<notifications@github.com>; 发送时间: 2020年7月7日(星期二) 凌晨2:19 收件人: "locuslab/TCN"<TCN@noreply.github.com>; 抄送: "武少广"<434658267@qq.com>; "Author"<author@noreply.github.com>; 主题: Re: [locuslab/TCN] Mnist classification problem (#52)

Oh, we flatten each image to 1D. For example, a 28x28 image is converted to a 784x1 sequence (i.e., length 784). So each "time step" to the TCN is essentially a single pixel, not an image.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

jerrybai1995 commented 4 years ago
  1. "[...] the hidden layer's input is the previous layer's output and has no direct relation with the origin input X" --> Correct. That is how any convolutional networks operate.

  2. "[...] permitting the input tensor" ---> Do you mean shuffling the data? That is just to ensure that the training data are sufficiently mixed and prevent the model from overfitting too early. You can also turn it off.

  3. What do you mean by partition strategies? Do you mean dilations?

lonelygoatherd commented 4 years ago
  1. So this is essentially different with LSTMs, whose input combines the hidden state and the input of next time step.

  2. I mean the permute operation on the input tensor x(N,C,T,V,M) in the code, as below. 

  3. I mean the partition strategies in the paper, including uni-labeling, distance partitioning and spatial configuration partitioning. For example, the last strategy devides the neighbour nodes into three subsets, then how to process the neighbour nodes when they are in different subsets? When we implement an aggregation opetation on a center node, we use three kernels corresponding to the three subsets respectively, is that correct? And the purpose of doing this is that the neighbour nodes dont share the same parameters because we have three kernels now.  This is helpful to enhancing the representation ability. Is that so? However, each subsets of the three could have different number of nodes, e.g., node i has four neighbours, and the four neighbours are devided into three subsets,the number of each subset is (1,1,2). Similarly six neighbours of node j are devided into (1,2,3), then what size are the three kernels?

------------------ 原始邮件 ------------------ 发件人: "Shaojie Bai"<notifications@github.com>; 发送时间: 2020年7月7日(星期二) 上午10:23 收件人: "locuslab/TCN"<TCN@noreply.github.com>; 抄送: "武少广"<434658267@qq.com>; "Author"<author@noreply.github.com>; 主题: Re: [locuslab/TCN] Mnist classification problem (#52)

"[...] the hidden layer's input is the previous layer's output and has no direct relation with the origin input X" --> Correct. That is how any convolutional networks operate.

"[...] permitting the input tensor" ---> Do you mean shuffling the data? That is just to ensure that the training data are sufficiently mixed and prevent the model from overfitting too early. You can also turn it off.

What do you mean by partition strategies? Do you mean dilations?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.