Open catalwaysright opened 2 years ago
See #5 #6, and see the papers.
See #5 #6, and see the papers.
Thanks for your reply! I have checked the issues and the paper. I just want to double check if I get it right. The parameters of query embedder are actually updated during fine-tuning but we just don't update the document embeddings with the updated query embedder. Thus, the embeddings of the same question will be different since the query embedder is optimized during fine-tuning and we may get different top-k relevant documents in the process of fine-tuning even if we input the same question.
Indeed, that is how optimization works, isn’t it?
We could migrate the async index refresh here, but it requires a lot of work due to its complexity.
On Sat, Mar 19, 2022 at 12:40 PM catalwaysright @.***> wrote:
See #5 https://github.com/qqaatw/pytorch-realm-orqa/issues/5 #6 https://github.com/qqaatw/pytorch-realm-orqa/issues/6, and see the papers.
Thanks for your reply! I have checked the issues and the paper. I just want to double check if I get it right. The parameters of query embedder are actually updated during fine-tuning but we just don't update the document embeddings with the updated query embedder. Thus, the embeddings of the same question will be different since the query embedder is optimized during fine-tuning and we may get different top-k relevant documents in the process of fine-tuning even if we input the same question.
— Reply to this email directly, view it on GitHub https://github.com/qqaatw/pytorch-realm-orqa/issues/9#issuecomment-1072939147, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF5PKNTIBHEP4DRFOGRKEUTVAVLDRANCNFSM5RDJG35A . You are receiving this because you commented.Message ID: @.***>
Another question is that I downloaded the natural_questions
dataset to local but when I tried to load it using the load function provided in data.py
, it showed that Dataset path currently not supported.
, which is just because it is local and I provide an OS path. How to fix it and load the local natural_questions dataset?
How did you download NQ?
On Sun, Mar 20, 2022 at 9:19 AM catalwaysright @.***> wrote:
Another question is that I downloaded the natural_questions dataset to local but when I tried to load it using the load function provided in data.py, it showed that Dataset path currently not supported., which is just because it is local and I provide an OS path. How to fix it and load the local natural_questions dataset?
— Reply to this email directly, view it on GitHub https://github.com/qqaatw/pytorch-realm-orqa/issues/9#issuecomment-1073142616, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF5PKNXXYRJBHTYMMFVTRQTVAZ4IVANCNFSM5RDJG35A . You are receiving this because you commented.Message ID: @.***>
How did you download NQ? … On Sun, Mar 20, 2022 at 9:19 AM catalwaysright @.> wrote: Another question is that I downloaded the natural_questions dataset to local but when I tried to load it using the load function provided in data.py, it showed that Dataset path currently not supported., which is just because it is local and I provide an OS path. How to fix it and load the local natural_questions dataset? — Reply to this email directly, view it on GitHub <#9 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF5PKNXXYRJBHTYMMFVTRQTVAZ4IVANCNFSM5RDJG35A . You are receiving this because you commented.Message ID: @.>
by using gsutil -m cp -R gs://natural_questions/v1.0 <path to your data directory>
and the structure is like this
The preferred way to download is using huggingface’s datasets library, which provides many utilities like caching, mapping, and filtering. The dataset’s source this library uses is also from Google.
If you however want to handle them by yourself, you’ll need to design a dataset loading function in data.py that returns the same format as load_nq().
On Sun, Mar 20, 2022 at 11:35 AM catalwaysright @.***> wrote:
How did you download NQ? … <#m-6377894214107844352> On Sun, Mar 20, 2022 at 9:19 AM catalwaysright @.> wrote: Another question is that I downloaded the natural_questions dataset to local but when I tried to load it using the load function provided in data.py, it showed that Dataset path currently not supported., which is just because it is local and I provide an OS path. How to fix it and load the local natural_questions dataset? — Reply to this email directly, view it on GitHub <#9 (comment) https://github.com/qqaatw/pytorch-realm-orqa/issues/9#issuecomment-1073142616>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF5PKNXXYRJBHTYMMFVTRQTVAZ4IVANCNFSM5RDJG35A https://github.com/notifications/unsubscribe-auth/AF5PKNXXYRJBHTYMMFVTRQTVAZ4IVANCNFSM5RDJG35A . You are receiving this because you commented.Message ID: @.>
by using gsutil -m cp -R gs://natural_questions/v1.0 <path to your data directory> and the structure is like this [image: 1647747161(1)] https://user-images.githubusercontent.com/60195620/159146888-6d2d70eb-322d-4b17-bafd-5df1979d36c1.png
— Reply to this email directly, view it on GitHub https://github.com/qqaatw/pytorch-realm-orqa/issues/9#issuecomment-1073159145, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF5PKNTNYLZ53DXMTUQHNJLVA2MG3ANCNFSM5RDJG35A . You are receiving this because you commented.Message ID: @.***>
Thank you so much for answering my questions so patiently! I encountered another problem when running run_finetune.py
with the exactly same args as your experiment. However, I got cuda out of memory like this.
I am running it on one V100 GPU with 15GB memory and I set the batch size to 1. Is it still not big enough to run this? How could I reduce the memory consumption and reproduce the experiment?
Hi, the fine-tune training given the default configuration can be run on single RTX 2080Ti, so V100 with 15GB mem is totally sufficient. You may find the reasons/solutions by googling the error message.
@catalwaysright Hey sorry I forgot to mention this, If you installed transformers
from master, you may need to add this line model.block_embedding_to("cpu")
after sending the model to GPU because the latest patch for REALM by default has block_emb
tensor, which would occupy appreciable GPU memory, sent to GPU along with model.cuda()
.
Sorry for bothering you again. Please show the specific place I should add model.block_embedding_to("cpu")
, because when I add it after sending the model to GPU in run_finetune.py
, it shows AttributeError: 'RealmForOpenQA' object has no attribute 'block_embedding_to'
. Thanks!
Hi, which version of transformers
are you using? You can install transformers==4.18.0
, where the latest REALM patch is included.
I tried your approach and is still shows cuda out of memory, but I figured it out that it may be normal because there is only 8G memory left on V100, which is not enough to load and optimize the whole model. How much space did you allocate in your RTX2080Ti?
Please reserve GPU memory at least equal or greater than 2080Ti. This is the minimal requirement.
On Sat, Apr 16, 2022 at 11:22 AM catalwaysright @.***> wrote:
I tried your approach and is still shows cuda out of memory, but I figured it out that it may be normal because there is only 8G memory left on V100, which is not enough to load and optimize the whole model. How much space did you allocate in your RTX2080Ti?
— Reply to this email directly, view it on GitHub https://github.com/qqaatw/pytorch-realm-orqa/issues/9#issuecomment-1100522890, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF5PKNRYQSQPLXLUS3MSFEDVFIXANANCNFSM5RDJG35A . You are receiving this because you commented.Message ID: @.***>
Hi! Now I am modifying this model with multiple retrievers and I am trying to train this model. However, during the training process, I found that the retriever loss and reader loss are all 0.0 at most times while the reader loss is also often 0.0 when I was training the original model. Why would there be so many 0.0? Is this normal at the beginning or there are other tricks of training this model.
If there is no presence of ground truth in any retrieved context or predicted answer span, their loss will be set to zero respectively to prevent ineffective updates.
It's likely to happen when you train the model from scratch without loading a pre-trained checkpoint like cc_news
or having proper warm up.
On I see! So it will be fine after more steps right?
For training from scratch, you should follow the steps in REALM/ORQA paper to pre-train/warmup your model; otherwise, the model is unlikely to further improve. If you were fine-tuning from cc-news
or a proper pre-trained checkpoint, then you can keep training and check the improvement of the losses.
Hi! I am wondering why the retriever is frozen during fine-tuning time. I think the retriever will learn more in fine-tuning. I am not very familiar with
tensorflow
. Is it possible to update the parameters of the retriever during fine-tuning time with this repository? How?