Open KOLANICH opened 5 years ago
Well this idea is great, now the problem is collecting data, what about making project based around the collection of data
Well this idea is great
I am not sure. I am not a NN pro, maybe the idea is useless junk.
now the problem is collecting data, what about making project based around the collection of data
We don't need to train models from scratch. It is proposed to use pretrained models, cut them in the middle, and add small adapter networks. They are not as large as full networks, so they don't need so much data for training.
Project description
There are lot of neural networks focused on doing things similar to machine translation, but not limited to text:
They usually work the following way:
1 encoder network encodes initial representation into a vector in latent space 2 decoder network decodes that feature vector into the things needed
Encoders and decoders are trained simultaneously and differ from a model to model, so do the internal representations.
The idea is to take as many different nets as possible, cut in the middle where the internal representations appear, then add some small net to each (let's call it a transcoder), connect different nets and train the transcoders to converge to the same representations. Then standardize the internal vector meaning. Then we should be able to use combinations of sources and sinks available in no individual model.
Notation
Let's call the tyes of content on the ends of a network
modalities
:seq<char, en>
(a seq of characters, forming a text in English)seq<char, ru>
seq<word_vec<...>, en>
picture
Let's call intermediate representations
transcoding modalities
.Let's call a model tranforming between transcoding modalities a
transcoder
.In the following text we are using oriented graphs. An edge
A -> [M] -> B
means thatmodel
(under model here we mean that we can compute gradients)M
transforms data in modalityA
into modalityB
. An edgeA => [O] => B
means it is possible to convert data in modalityA
into modalityB
using a black-box oracleO
, such as a TTS program or external API. We may call an edge amodel
or anoracle
.API
ModalityId
is an identifier of the modality. The standardized global representation has identifierglobal
. Then there are modality-specific representations, which names are also standardized, likespeech_latent
,image_latent
, etc...When installed, the middleware exposes the API to applications:
getInfo(out info)
- returns information about installed models, oracles and their modalities available.startTransaction(out txId)
- starts a transaction. All the models adding/removing stuff is done within transactions. In the end of a transaction the middleware checks consistency and removes all the transcoders ending to unreferenced transcoding modalities.commit(in txId, out error)
- checks consistency and commitsregisterModel(in our_model_id, in inModalityId, in outModalityId, in our_model)
adds a net into the registry.our_net_id
is the globally unique net id. Transcoder networks are registered the same way as modalities networks, but for ids uuids are used. When a uuid is used, the server recognizes the modality as atranscoding modality
.ourModel
is the model serialezed, for example, into ONNX or CNTK.unregisterModel(in netId)
- removes the model.registerModalityType(in name, in parameterCount)
registerModality(in type, in arguments)
registerDatasetType(in name, in modalityType, in modalityType)
registerOracleType(in sourceModalityType, in targetModalityType)
registerOracle(in oracleType, in entryPoint)
unregisterOracle(in oracleId)
getPath( in inputMkdalityId, in outModalityId, in pieces, in preference, out path)
- finds the optimalpath
(a sequence of models) between modalities, usingpieces
to require certain networks and representations to be in a path and usingpreference
as criteria of optimality.convert(in inputRepresentation, in inputModalityId, in outModalityId, in path, in returnGradients, out outputRepresentation)
- converts the input into output. IfreturnGradients
istrue
, gradients are returned.getModel(in netId, out net)
- returns the serialized model. Used for training transcoders.getTrainingOpportunities(in inModality, in outModality, out trainingOpportunities)
- returns training opportunities.Training opportunities
In order to train a model one needs a dataset. Datasets can be obtained different ways.
getTrainingOpportunities
is a function analysing the graph of available models and oracles and returning the candidates usefult for trining the specific model betaeen 2 modalities.If G is the graph and we want to train a model
A -> [M] -> B
, where A and B are subgraphs in G, then a suitable dataseta, b
(wherea
andb
are modalities) is one, satysfying the following conditions:model
.Then the middleware matches the endpoints against abailable dataset types.
A
TrainingOpportunity
is an object containing leaning graph and datasets matching this graph.Training software should use training opportinities in order to get the needed datasets (for example from a repository of datasets like OpenML or using a database instructing the software how to retrieve and preprocess datasets or asking a user to do that) and then train the needed model.
Example
Assume we have the models installed (all models names are random)
text<en> -> [Painter] -> [PainterTranscoderText] -> text_latent <-> global <-> image_latent -> [PainterTranscoderImage] -> [Painter] -> image
speech<ru> -> [RuSpeechRecog] -> [RuSpeechRecogTranscoder] -> speech_latent <-> global <-> text_latent -> [RuSpeechRecogTranscoder] -> [RuSpeechRecog] -> text<ru>
And oracles registered:
text<α> => [TTS] => speech<α>
text<α> => [machine translation] => text<β>
And datasets types registered:
text<α>
-text_corpuses<α>
(Wikipedia, Gutenberg, texts in Internet)text<α>, text<b>
-bilingval_sources<α, β>
speech<α>
-speech corpuses<α>
(videohostings, podcasts, songs)speech<α>, text<α>
-transcribed<α>
videos and audiosAnd there exists a pretrained model
text<en> -> [EnTTS] -> {..EnTTSLatent..} -> {..EnTTSLatent..} -> [EnTTS] -> speech<en>
.We wanna create 2 transcoders
{..EnTTSLatent..} -> [EnTTSTextTranscoder] -> text_latent
speech_latent -> [EnTTSSpeechTranscoder] -> {..EnTTSLatent..}
So for the first transcoder we call
and should get
text<en> -> [EnTTS] -> {..EnTTSLatent..} -> [EnTTSTextTranscoder] -> text_latent <- [Painter] <- text<en>
(text_corpuses<en>
,bilingval_sources<en, β>
,transcribed<en>
)text<ru> => [machine translation] => text<en> -> [EnTTS] -> {..EnTTSLatent..} -> [EnTTSTextTranscoder] -> text_latent -> [RuSpeechRecogTranscoder] -> [RuSpeechRecog] -> text<ru>
(text_corpuses<ru>
,bilingval_sources<ru, β>
,transcribed<ru>
)text<en> -> [EnTTS] -> {..EnTTSLatent..} -> [EnTTSTextTranscoder] -> text_latent -> [RuSpeechRecogTranscoder] -> [RuSpeechRecog] -> text<ru>
(bilingval_sources<en, ru>
)text<ru> => [machine translation] => text<en> -> [EnTTS] -> {..EnTTSLatent..} -> [EnTTSTextTranscoder] -> text_latent <- [Painter] <- text<en>
(bilingval_sources<en, ru>
)Example usage
Text to speech
Creating a transcoder
Complexity and required time
Complexity
Required time (ETA)