microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.52k stars 4.28k forks source link

Branching and CloneFunction #2593

Open csolorio opened 6 years ago

csolorio commented 6 years ago

Hi,

I'm trying to train a model that have several branches but just having one active branch during each minibatch training. How can I achieve this? Let me explain a little bit the model: ..............|-- branch1_1 - |.........................................|-- branch1_2 - output1 Input -|-- branch2_1 - | -- common_path -- |-- branch2_2 -output2 ..............|-- branch3_1 - |..........................................|-- branch3_2 - output3

Depending on the input data, I would only have to use one branch path while keeping the rest fixed and just having the corresponding output active. Could I achieve this by using the clone and freeze function on the branches that are inactive during an update? Some tutorial that could help me?

ebarsoumMS commented 6 years ago

Hi @csolorio

Is the one active branch always the same? Or it change from MB to MB?

csolorio commented 6 years ago

The branch would be randomly selected between minibatches. Is that possible?

ebarsoumMS commented 6 years ago

Yes, it is possible, but a little complicated. The idea is to have 2 version of each branch one with frozen parameters and one without. Then you will have 3 networks, each network has 2 branches frozen and on not-frozen. Assuming the 3 networks are z1, z2 and z3.

Now you need to create 3 trainers, one for each network and switch between trainers at random after each MB.

Another idea without using cloning or creating multiple trainer. Is to add an 3 extra float inputs to your network, those extra inputs will always be set to zeros except for one will be set to 1.0, and use those as coefficient to the 3 networks in your loss function.

csolorio commented 6 years ago

Ok, great ideas! I'll try them. Can one learner be used in the 3 trainers or would separate learner instances be required?