Question about the optimal percentage of units to gate in XdG

Thanks for your presentation! In the XdG method, the percentage of hidden units need to be decided. As mentioned in the paper, the optimal percentage depends on the network size and architecture and the number of tasks upon which the network is trained. I’m interested in whether this value is calculated before the task by using the relationship of the network size and architecture and the number of tasks upon which the network is trained, or we don’t know the concrete relationship and this optimal value is decided after several tasks.

uchicago-computation-workshop / nicolas_masse

Question about the optimal percentage of units to gate in XdG #39