As humans, we can naturally associate different sensations together. When hearing a train whistle we can easily picture a scene of a train approaching the station in our mind. Looking at a picture of bird, it will be a piece of cake for us to imagine the sound the bird will make. Can we possibly get the neural network do the same thing? Can we make neural network add sound track to a silent movie? Or in reverse, "imagine" scenes according to a given sound track?
As humans, we can naturally associate different sensations together. When hearing a train whistle we can easily picture a scene of a train approaching the station in our mind. Looking at a picture of bird, it will be a piece of cake for us to imagine the sound the bird will make. Can we possibly get the neural network do the same thing? Can we make neural network add sound track to a silent movie? Or in reverse, "imagine" scenes according to a given sound track?