Closed cleong110 closed 1 month ago
Also: a lot of the statistical/mathematical language in this paper was difficult for me, so the potential for misunderstandings is higher
Please pull and merge master into this branch. that should remove any changes from other PRs from the changelist. If not, please create a separate PR with only the relevant changes. (You should always branch out from master)
Please pull and merge master into this branch. that should remove any changes from other PRs from the changelist. If not, please create a separate PR with only the relevant changes. (You should always branch out from master)
Can do, I'll give it a try
That did it!
Thanks for all the helpful suggestions! Working on rewrites.
Re: "please pull first", I think it is in the correct state now, cleong110:paper/CV-SLT
now shows as being synced up with sign-language-processing:master
, is there more to do on that front?
Asked GPT4-o to help summarize the PDF of the paper for me, as advised. The conversation and prompt are at https://chat.openai.com/share/c720d444-a505-491e-b131-78e03e10e700. It actually did quite well, I think!
The explanation aligns with my understanding and seems to clarify. And seems to line up with Figure 2 here:
OK, so it seems the encoder is supposed to generate distributions of possible embeddings, and the decoder is supposed to generate distributions of possible text translations. And the KL divergences are there to, like, pull the distributions with/without text inputs closer.
So I guess the training process goes like:
So you've got three loss terms. One overall reconstruction loss, one for the encoder to help it align with-visual and without-visual distributions, and one for the decoder to help with consistency of outputs.
So over time, these three losses go down. The encoder gets more consistent at encoding/embedding to the same thing internally with or without text. The decoder, given those embeddings, gets more consistent about generating the same output either way, with or without text. And the ability of the overall prior path to get to the right text gets better.
I think I'm ready to rewrite it now.
Wrote a new summary myself, then asked ChatGPT 3.5 to rewrite for conciseness.
A few minor edits to ChatGPT's version for style guide purposes gives us:
@zhaoConditionalVariationalAutoencoder2023 introduces CV-SLT, employing conditional variational autoencoders to address the modality gap between video and text.
They assess the disparity using RWTH-PHOENIX-Weather-2014T data, correlating similar embeddings with improved BLEU scores.
Their approach involves guiding the model to encode visual and textual data similarly through two paths: one with visual data alone and one with both modalities.
Using KL divergences, they steer the model towards generating consistent embeddings and accurate outputs regardless of the path.
Once the model achieves consistent performance across paths, it can be utilized for translation without gloss supervision.
Evaluation on RWTH-PHOENIX-Weather-2014T [@cihan2018neural] and CSL-Daily [@dataset:huang2018video] datasets demonstrates its efficacy.
They provide a code implementation based largely on @chenSimpleMultiModalityTransfer2022a.
Which I think ought to take care of all the issues:
Sorry, yes, I meant ask chatgpt to help with your writing. you write a draft, it gives tips to improve
How do you typically prompt ChatGPT for writing improvement suggestions, btw? I just asked it to help me rewrite for conciseness, being sure to mention it was a summary of an academic paper. Wondering if you've got some particular prompting tricks that work well
Making a PR to add http://arxiv.org/abs/2312.15645, one of the recent approaches (with code!) listed on https://github.com/ZechengLi19/Awesome-Sign-Language.
I only intended to include f435dcadd968e357b75425fec5902a57dfb96180 in this PR, unfortunately, I made the previous PR on my fork's master branch so this PR now includes those commits? I think in future if I make a new branch on my end for each commit that should prevent it. And I believe once the previous PR is merged this one should be mergeable easily as well?