Closed zhiweihu1103 closed 1 year ago
Hi, Zhiwei. I retrained the model with RichpediaMEL dataset, and everything seems fine. Based on the training logs you provided, I notice that the loss appears to be much larger than usual. In my training, after the first epoch, the Train/loss_epoch is around 3.19.
Train/loss_step | epoch | step | Train/loss_epoch |
---|---|---|---|
2.700 | 0 | 29 | |
2.965 | 0 | 59 | |
2.596 | 0 | 89 | |
0 | 97 | 3.187 |
Is the issue of not being able to reproduce the results limited to RichpediaMEL, or does it apply to all datasets?
Hi, Pengfei. Only limited to RichpediaMEL, the other two datasets can get results close to the original text.
In addition, I see that many attr fields in the dataset are empty. Is this field not used in the end?
Hi, Pengfei. Only limited to RichpediaMEL, the other two datasets can get results close to the original text.
That's strange. I've checked the MD5 of the files, and they appear to match the ones on my training server. Can you please check the learning rate during training? It seems that after the second epoch, the loss no longer exhibits significant changes.
In addition, I see that many attr fields in the dataset are empty. Is this field not used in the end?
For some entities, I couldn't retrieve suitable attributes from Wikidata (possibly due to a network issue), so I left them blank. In the implementation, the attributes are concatenated with the entity's name.
I need to print the learning rate after each round, right? I also found that the losses did not change much after the second round.
Okay, that means attr is not used in the current dataset, right?
I need to print the learning rate after each round, right? I also found that the losses did not change much after the second round.
You could log the learning rate without hassle by using PyTorch Lightning callbacks. You simply need to add it to the trainer's callbacks.
import os
import torch
import pytorch_lightning as pl
from pytorch_lightning.callbacks import ModelCheckpoint, EarlyStopping, LearningRateMonitor
from codes.utils.functions import setup_parser
from codes.model.lightning_mimic import LightningForMIMIC
from codes.utils.dataset import DataModuleForMIMIC
if __name__ == '__main__':
args = setup_parser()
pl.seed_everything(args.seed, workers=True)
torch.set_num_threads(1)
data_module = DataModuleForMIMIC(args)
lightning_model = LightningForMIMIC(args)
logger = pl.loggers.CSVLogger("./runs", name=args.run_name, flush_logs_every_n_steps=30)
ckpt_callbacks = ModelCheckpoint(monitor='Val/mrr', save_weights_only=True, mode='max')
early_stop_callback = EarlyStopping(monitor="Val/mrr", min_delta=0.00, patience=3, verbose=True, mode="max")
lr_callback = LearningRateMonitor(logging_interval='step')
trainer = pl.Trainer(**args.trainer,
deterministic=True, logger=logger, default_root_dir="./runs",
callbacks=[ckpt_callbacks, early_stop_callback, lr_callback])
trainer.fit(lightning_model, datamodule=data_module)
trainer.test(lightning_model, datamodule=data_module, ckpt_path='best')
Okay, that means attr is not used in the current dataset, right?
I'm not sure what you mean by "not used." Our intention is to utilize the attributes to enhance the representation of entities. Therefore, we concatenate the flattened key-value attributes with the entity's name as textual input.
I will give feedback this afternoon or evening.
What I mean is that I saw that the attr field is empty, indicating that attr is not used. In the code, I saw that there is indeed a part where attr is spliced.
What I mean is that I saw that the attr field is empty, indicating that attr is not used. In the code, I saw that there is indeed a part where attr is spliced.
No, I did use attributes. However, due to network issues or the absence of suitable attributes, some entities have empty or missing attr
field.
Ok, I understand.
In addition, would you mind provide the Figure 4 datasets (10% and 20% for RichpediaMEL and WikiDiverse), and the numerical results, I need to draw my own histogram, but I don't know the specific value of your histogram.
Hi Pengfei. I have provided the training logs with learning rate logs. Please also pay attention to the question I mentioned above about the dataset and numerical results of Figure 4, looking forward to the discussion. richpediamel_lr.txt metrics.csv
Hi, Pengfei. Anything update?
Hi, Pengfei. Anything update?
Hi, sorry for the late response. I have reviewed your log file, and the learning rate appears to be fine. I attempted to retrain the model using the code and original data we uploaded, and the loss and evaluation results match our reported findings. Could you please check the configuration file config/richpediamel.yaml
to see if there is anything wrong? Could you also provide details about the environment you used to train the model?
If you want to reproduce the reported results right now, I have uploaded a model checkpoint here (password: KDD2023richpedia).
In the low-resource setting, we only utilized the first 10% and 20% of the training data for each dataset, following the order in the training data file. This means that if you want to access the low-resource training data, you only need to control the amount of training data used.
Please add a new line after https://github.com/pengfei-luo/MIMIC/blob/59ef385c14c5bffd70eaf8012f876850f6b99072/codes/utils/dataset.py#L44
train_data = train_data[:int(len(train_data) * 0.1)] # or 0.2
Then you can obtain either 10% or 20% of the training data we used.
Regarding the numerical results you've requested, I will update them in the readme file in the next few days. Please stay tuned.
Hi Pengfei. First, I uploaded the yaml file information I used, and I did not make any modifications except the path; secondly, for the running environment, I created it through conda alone, and the environment information is exactly the same as your requirements.txt.
run_name: RichpediaMEL
seed: 43
pretrained_model: '/checkpoint/clip-vit-base-patch32'
lr: 1e-5
data:
num_entity: 160933
kb_img_folder: /data/RichpediaMEL/kb_image
mention_img_folder: /data/RichpediaMEL/mention_image
qid2id: /data/RichpediaMEL/qid2id.json
entity: /data/RichpediaMEL/kb_entity.json
train_file: /data/RichpediaMEL/RichpediaMEL_train.json
dev_file: /data/RichpediaMEL/RichpediaMEL_dev.json
test_file: /data/RichpediaMEL/RichpediaMEL_test.json
batch_size: 128
num_workers: 8
text_max_length: 40
eval_chunk_size: 6000
eval_batch_size: 20
embed_update_batch_size: 512
model:
input_hidden_dim: 512
input_image_hidden_dim: 768
hidden_dim: 96
dv: 96
dt: 512
TGLU_hidden_dim: 96
IDLU_hidden_dim: 96
CMFU_hidden_dim: 96
trainer:
accelerator: 'gpu'
devices: 1
max_epochs: 20
num_sanity_val_steps: 0
check_val_every_n_epoch: 2
log_every_n_steps: 30
All environmental information is:
absl-py 1.4.0
aiohttp 3.8.5
aiosignal 1.3.1
antlr4-python3-runtime 4.9.3
async-timeout 4.0.3
attrs 23.1.0
cachetools 5.3.1
certifi 2023.7.22
charset-normalizer 3.2.0
click 8.1.7
filelock 3.12.3
frozenlist 1.4.0
fsspec 2023.9.0
google-auth 2.22.0
google-auth-oauthlib 1.0.0
grpcio 1.57.0
huggingface-hub 0.16.4
idna 3.4
importlib-metadata 6.8.0
joblib 1.3.2
Markdown 3.4.4
MarkupSafe 2.1.3
multidict 6.0.4
numpy 1.24.4
oauthlib 3.2.2
omegaconf 2.2.3
packaging 23.1
Pillow 9.3.0
pip 23.2.1
protobuf 4.24.2
pyasn1 0.5.0
pyasn1-modules 0.3.0
pyDeprecate 0.3.2
pytorch-lightning 1.7.7
PyYAML 6.0.1
regex 2023.8.8
requests 2.31.0
requests-oauthlib 1.3.1
rsa 4.9
sacremoses 0.0.53
setuptools 68.0.0
six 1.16.0
tensorboard 2.14.0
tensorboard-data-server 0.7.1
tokenizers 0.12.1
torch 1.11.0
torchmetrics 0.11.0
tqdm 4.66.1
transformers 4.18.0
typing_extensions 4.7.1
urllib3 1.26.16
Werkzeug 2.3.7
wheel 0.38.4
yarl 1.9.2
zipp 3.16.2
Thanks for information about how to run the low-resource setting experiments. I am very much looking forward to your numerical results, thank you for your efforts. In addition, regarding the reproduction of dataset RichpediaMEL, I think whether there may be some differences between the code you reproduced and the code uploaded, because I ran it twice on this dataset and the results were exactly the same as I above upload.
I can reproduce the results with the code we shared and the data we uploaded to OneDrive. Is there anything difference about the pretrained model? I saw you change the path. I use the one form huggingface.
SHA256: a63082132ba4f97a80bea76823f544493bffa8082296d62d71581a4feff1576f MD5: 47767ea81d24718fcc0c8923607792a7
I download the pretrained clip from https://huggingface.co/openai/clip-vit-base-patch32/tree/main, I will replace the pytorch_model.bin with the link you provided, upload the results tomorrow morning.
But I found that the CLIP weighted link address I downloaded actually came out exactly the same as the one you provided after clicking pytorch_model.bin.
Hi Pengfei. I may need further help from you, because I still have difficulty reproducing the results of dataset RichpediaMEL, even though I have used the CLIP pre-training URL you gave me (actually the same pre-trained model I used previous), I will upload it below my running logs on three datasets. wikidiverse_another.txt wikimel_another.txt richpediamel_another.txt
This is very strange. The other two datasets work fine, only RichpediaMEL has an issue. Maybe you could double-check the RichpediaMEL.tar file you downloaded? I will share an online Wandb report later to show that everything is normal on my end.
RichpediaMEL.tar MD5: 0f499eddde7582428947e45ebb94388f SHA256: 36ac5703e4a9890238daedf039a7b2923a7c4b66c66a6b9cf788db40eabe0447
I will take a screenshot to share the information after decompressing the RichpediaMEL dataset. the kb_image has 96073 files, and mention_images has 15852 files.
I download the RichpediaMEL dataset from you provided: https://mailustceducn-my.sharepoint.com/:u:/g/personal/pfluo_mail_ustc_edu_cn/ERikbOQuoWFHrA_AizcuCbgB8PBOiRqCV4U0lZfxUN-6kg?e=speIdh
Could you please try upgrading transformers to version 4.27.1? I notice that the version of transformers might have an impact on the results, although I'm not sure what's causing the differences in results.
pip install transformers==4.27.1 --upgrade
Let me check.
The Wandb report is here.
You use the transformers==4.27.1 right?
Yes, in the Wandb report run, I used torch==1.11.0 and transformers==4.27.1. Other packages are the same as the requirements. I attempted to downgrade transformers to 4.18.0 and noticed that it did lead to a performance drop. I have no idea why this occurred.
If the performance degradation is due to transformers, then this should not be within the scope of our discussion. As long as the results can be reproduced, everything is good. I'll re-run and give my reproduction results.
Hi, Pengfei. I think it is still difficult for me to reproduce the results of dataset RichpediaMEL. The following table is a comparison of the results using different versions of transformers. I also uploaded the training log of transformers==4.27.1. richpediamel_new_transformers.txt I also upload the metrics.csv. metrics.csv I compared my train_loss_epoch on the RichpediaMEL dataset and the train_loss_epoch you provided on Wandb, and found that there is a huge difference. Is your Wandb training log directly running the code in your open source code repository?
Just replace the CSV logger with the Wandb logger to enable Wandb logging.
logger = pl.loggers.WandbLogger(project='MIMIC', name=args.run_name)
No, what I mean is, was your result Wandb run on your current open source code? Because the problem now is that the RichpediaMEL dataset cannot be reproduced.
Yes, I cloned form github and only modified a few lines regarding logging. You can check the information and the code of this run here (codes form the left bar).
It's amazing, I can't imagine why it's so hard to reproduce.
Hi, Pengfei. Firstly, I carefully compared the open source github code with the code you used on wandb. There are only some differences when the CLIP model executes the from_pretrained method. The open source code on github is:
self.tokenizer = CLIPProcessor.from_pretrained(self.args.pretrained_model).tokenizer
The code used by wandb is:
self.tokenizer = CLIPProcessor.from_pretrained(self.args.pretrained_model, local_files_only=True).tokenizer
But I think this is not the main problem, because after I added the local_files_only=True parameter, I found the result was the same.
Then, I created a requirements.txt environment that is exactly the same as wandb provided, and the running results are exactly the same as mine before, indicating that the difference in results is not caused by environmental problems.
So, I need to confirm now, is the RichpediaMEL dataset you are using the version you uploaded? Because now all the code and environment information are completely consistent, the performance difference is difficult to accept.
I will take a screenshot to share the information after decompressing the RichpediaMEL dataset. the kb_image has 96073 files, and mention_images has 15852 files.
Here are the statistics for the RichpediaMEL dataset I used.
Maybe you can check if the MD5 values of all the files match mine?
ba086b054bf52d549f2a79503c76704a kb_entity.json
8059b7aa89a9314d5dc38607a8685eeb qid2id.json
831cdd92d70a93ea8a442798ec2fcde1 RichpediaMEL_dev.json
9e07e5e970e01079d256311e5ac10bd8 RichpediaMEL_test.json
e1d0b2adb2a1114cefa63860ffa23982 RichpediaMEL_train.json
961efc263bc8e2e7b257a28e8e703633 kb_image.zip
474c594ce8a95aa5dc9222365db0044e mention_images.zip
The parameter local_files_only=True
ensures that local files are used, and we have already confirmed that the model weights are consistent. I think this won't have any impact.
You can ignore the .pkl files, I found a difference between kb_image and mention_images.
Can you provide the MD5 values for kb_image.zip and mention_images.zip? I directly extracted these two ZIP files.
Wait a few minutes, I deleted the original file after decompressing it, and I need to download it again.
I can't think of any other reason why it is difficult to reproduce, because the size of the .zip file is the same, but the size after decompression is different?
I checked your running log on wandb , and your loss is obviously much lower than what I reproduced.
It seems all the files are normal. The difference in folder sizes may be due to differences in how the operating system organizes files.
Perhaps you can try changing some hyperparameters, such as the random seed, learning rate, and batch size, to see if they have an impact on the loss. If you have access to other servers, maybe you can try configuring the environment and running it on other servers. I don't know what's causing the inability to reproduce the results. All the results on my end are normal.
I can try it on other machines, but judging from my experience running your code, as long as the random seed is fixed, the results will be exactly the same every time.
I think it is necessary to give some new content. I originally ran the code on the V100 32G graphics card. Now I have tried it on the A6000 and found that the final result of the model is almost the same as that of the V100. Have you made any other modifications? Because the hyperparameters I use are completely consistent with the yaml you provided.
Hi, Pengfei. Nice work. I find I cannot reproduce the RichpediaMEL dataset result,, I use the same yaml as you provided, can you help me? attachment is the training logs. richpediamel.txt