shahsohil / DCC

This repository contains the source code and data for reproducing results of Deep Continuous Clustering paper
MIT License
209 stars 53 forks source link

Gradients of U with respect to F (feature map) #16

Closed LemonPi closed 5 years ago

LemonPi commented 5 years ago

Hey @shahsohil can you clarify how could I use the DCC output (Z or U representatives) to get a gradient of some future loss function L(Z) with respect to my feature transform parameters F(X|theta)? (I'm using Z in the diagram to match notation in the paper, but I'm actually using U)

My current understanding of the data flow is summarized by the flowchart below. The dashed arrows are routes where the gradient can back propagate. The green boxes hold parameters that requires gradients in the pytorch sense. The red dashed line for dF/dX means the gradient theoretically exists but the current implementation does not allow for it. Gradient with respect to the feature transform means with respect to the parameters of the feature transform (d/dF means d/dtheta) Data flow - Page 3

After DCC I have representatives U that I then use in some later steps of the pipeline. I can get a gradient wrt U, but from the flowchart above there doesn't seem to be any way of propagating that back to F. The whole point of the pipeline is to learn the parameters for F, so the current architecture doesn't seem to work. One way to address this is to bring the later processes using U inside of the DCC loop as terms in the cost function. Do you have any ideas (and is my interpretation of the data flow wrong)?

shahsohil commented 5 years ago

Data flow: There should be no path between Y and Z. Z's are updated only using DCC Loss.

Currently there is no direct path between U and F. But 'Y' updates are based on U's and thus indirectly it influences F as U can be updated using any number of loss functions. If you want to work directly with F, then you can instead utilise the representatives 'Y'. They are outputs of encoder.