Closed piedralaves closed 1 year ago
Hi @piedralaves ,
For weights of those similar words also need to be updated, how do you calculate their gradients? Do they use the same gradients of the "key" words? Likes the formula: [gradient of similar word] = coefficient * [gradient of key word]. Is it correct? If so, how do you calculate the coefficient? Using cosine similarity or others?
The implementation of weights updates is under "Seq2SeqSharp\Seq2SeqSharp\Optimizer" folder, and now it has two optimizers: AdamOptimizer.cs and RMSPropOptimizer.cs Seq2seqSharp calls it after backward and gradient calculations. You can take a look code there and modify it if it's necessary. In BaseSeq2SeqFramework.cs file, it's called in TrainOneEpoch method. You can try to find "solver.UpdateWeights(models, processedLine, lr, m_regc, m_weightsUpdateCount);" there.
For that article in the above link, sorry that I don't have subscription on New York Times, so I cannot get it. Maybe it mentioned the problem you are trying to deal with and how to apply it in the real world. If you don't mind, could you please explain more details and findings you have?
Yes, I'm really busy on my daily works, but I would like to help you for any questions on Seq2SeqSharp, and discussion on NLP and machine learning problems in my spare time. :)
Thanks Zhongkai Fu
Thanks a lot.
Yes, we planned use the same gradients of the key words and calculate the coefficient based on cosine, as in vector space models. do you think is a right aproximation?
We will revise what you said and let you know.
We send you in this message some of the papers that talk about some of the things we are dealing with, beetween others.
Linguistic generalization and compositionality in.pdf
(https://github.com/zhongkaifu/Seq2SeqSharp/files/11081939/Linguistic.generalization.and.compositionality.in.pdf) zhongkaifu/Se Exploring_Processing_of_Nested_Dependencies_in_Neu.pdf q2SeqSharp/files/11081933/BaroniRNN.pdf)
Thanks a lot
Thanks @piedralaves . I will take a look.
It seems one of the challenge parts is how to deal with weights updates of ambiguous similar words in different contexts.
Thanks Zhongkai Fu
Let me think about it, but in principle, I guess that is not a big problem.
At this time, we are testing some issues to have a conceptual test, working like laboratory, but we want to have the "family updating" ready to put into the test a full version of compositionality and even the posibility of a better way of generalization not in dependence to the training sample.
Any remark to the points above will be appreciated. We are obviously very interested in your background.
Hi Zhongkai:
We want to do the following:
In the moment that the weights of the embedding matrix are updating, we want to update other words not in the sentence. The criterium by which we update them is by similarity with the words impacted in the updating. That is, if a word (the part of the embedding matrix that represents it -the vector-) is updated, some other words are also updated proportionally (by a coefficient). To do that, we need to evaluate the function or functions involved in the updating. Tentatively, we call this mechanism “family updating”, and will be deployed, if we can and it works, in order to help to deal with the phenomena called “systematic compositionality” in rule acquisition (“stimulus poverty” phenomena), which is reported in some studies to be a problem. Something that is recently say by Chomsky in New York Times. https://www.nytimes.com/2023/03/08/opinion/noam-chomsky-chatgpt-ai.html We want to deal with it by two things:
If you are also interested in this kind of things, we can report you our first results and you can even participate in papers and white papers. Remember that we are interested in Cognitive aspects of the models, but also in applications to the resolution of technical problems. In any case, we understand that you are a busy person with important projects. Seq2seqSharp is one of them, that we really appreciate.
So first questions are: At what moment the weighs are updated? And, In what moment is completed? We want to manipulate the embedding at that times. We have explored some parts, but we want to ear your advises, if possible.
Thanks a lot for all.