imJiawen / Warpformer

MIT License
20 stars 2 forks source link

Reproduction of physioNet dataset #3

Open lihaha-96 opened 5 months ago

lihaha-96 commented 5 months ago

Sorry to bother you, I have adjusted the code according to the run.sh file you explained, but the reproduction results are not satisfactory. To this end, I conducted a number of parameter experiments, but the results did not improve much. I would like to discuss this issue with you. Looking forward to your reply, thank you very much.

imJiawen commented 5 months ago

Hi @Lixinshuai-bit,

Have you tried the commit 5dd0f9a9c1488ee7f70d77be615a865fc2eca4e9?

If this configuration does not work for you, please let me know. I would appreciate it if you could provide the experimental logging file.

lihaha-96 commented 5 months ago

Teacher Zhang: Hello! Yes, we encountered some difficulties during the reproduction process. Attached is the log file. We look forward to your answers.

bfs_lxs

From: Jiawen Zhang Date: 2024-04-01 10:05 To: imJiawen/Warpformer CC: Lixinshuai; Mention Subject: Re: [imJiawen/Warpformer] Reproduction of physioNet dataset (Issue #3) Hi @Lixinshuai-bit, Have you tried the commit 5dd0f9a? If this configuration does not work for you, please let me know. I would appreciate it if you could provide the experimental logging file. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

imJiawen commented 5 months ago

Teacher Zhang: Hello! Yes, we encountered some difficulties during the reproduction process. Attached is the log file. We look forward to your answers. bfs_lxs From: Jiawen Zhang Date: 2024-04-01 10:05 To: imJiawen/Warpformer CC: Lixinshuai; Mention Subject: Re: [imJiawen/Warpformer] Reproduction of physioNet dataset (Issue #3) Hi @Lixinshuai-bit, Have you tried the commit 5dd0f9a? If this configuration does not work for you, please let me know. I would appreciate it if you could provide the experimental logging file. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

I am sorry that I cannot see this file. Could you please send it to jzhang302@connect.hkust-gz.edu.cn?

lihaha-96 commented 5 months ago

Teacher Zhang: Hi! Has been sent. Please check it.

bfs_lxs

From: Jiawen Zhang Date: 2024-04-01 10:39 To: imJiawen/Warpformer CC: Lixinshuai; Mention Subject: Re: [imJiawen/Warpformer] Reproduction of physioNet dataset (Issue #3) Teacher Zhang: Hello! Yes, we encountered some difficulties during the reproduction process. Attached is the log file. We look forward to your answers. bfs_lxs From: Jiawen Zhang Date: 2024-04-01 10:05 To: imJiawen/Warpformer CC: Lixinshuai; Mention Subject: Re: [imJiawen/Warpformer] Reproduction of physioNet dataset (Issue #3) Hi @Lixinshuai-bit, Have you tried the commit 5dd0f9a? If this configuration does not work for you, please let me know. I would appreciate it if you could provide the experimental logging file. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.> I am sorry that I cannot see this file. Could you please send it to @.? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

lihaha-96 commented 5 months ago

Dear Zhang, Excuse me for bothering you again. Have you read the logging file? What is the reason for my failure. Looking forward to your reply!

---- Replied Message ---- | From | Jiawen @.> | | Date | 04/01/2024 10:39 | | To | imJiawen/Warpformer @.> | | Cc | Lixinshuai @.>, Mention @.> | | Subject | Re: [imJiawen/Warpformer] Reproduction of physioNet dataset (Issue #3) |

Teacher Zhang: Hello! Yes, we encountered some difficulties during the reproduction process. Attached is the log file. We look forward to your answers. bfs_lxs From: Jiawen Zhang Date: 2024-04-01 10:05 To: imJiawen/Warpformer CC: Lixinshuai; Mention Subject: Re: [imJiawen/Warpformer] Reproduction of physioNet dataset (Issue #3) Hi @Lixinshuai-bit, Have you tried the commit 5dd0f9a? If this configuration does not work for you, please let me know. I would appreciate it if you could provide the experimental logging file. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

I am sorry that I cannot see this file. Could you please send it to @.***?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

imJiawen commented 5 months ago

Hi @Lixinshuai-bit ,

Sorry for my late reply.

I have identified the root cause of the performance discrepancy mentioned. It appears that this issue originated from the way the dataset was constructed. Specifically, the dataset used in our paper was compiled using a code provided by mTAN beforehand. However, the code for downloading and preprocessing the data, as currently available in this repository, underwent certain modifications. These changes inadvertently introduced a bias in the dataset, which in turn affected the performance metrics.

I will update the data preprocessing code in this repository to be consistent with the original method used in mTAN. Before that, for accurate replication of the paper's results, we advise you to re-download and preprocess the dataset using the code from mTAN. This should ensure the dataset aligns with the one used in our study.

lihaha-96 commented 5 months ago

Dear Zhang, I am glad to receive your reply. Thank you for providing me with a solution to the problem. I will use the mTAN code to handle it. Similarly, we look forward to the update of your code repository.

---- Replied Message ---- | From | Jiawen @.> | | Date | 04/06/2024 18:29 | | To | imJiawen/Warpformer @.> | | Cc | Lixinshuai @.>, Mention @.> | | Subject | Re: [imJiawen/Warpformer] Reproduction of physioNet dataset (Issue #3) |

Hi @Lixinshuai-bit ,

Sorry for my late reply.

I have identified the root cause of the performance discrepancy mentioned. It appears that this issue originated from the way the dataset was constructed. Specifically, the dataset used in our paper was compiled using a code provided by mTAN beforehand. However, the code for downloading and preprocessing the data, as currently available in this repository, underwent certain modifications. These changes inadvertently introduced a bias in the dataset, which in turn affected the performance metrics.

I will update the data preprocessing code in this repository to be consistent with the original method used in mTAN. Before that, for accurate replication of the paper's results, we advise you to re-download and preprocess the dataset using the code from mTAN. This should ensure the dataset aligns with the one used in our study.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

CodeNinjaja commented 3 months ago

I have the same problem after trying the script. And I get the following results: AUROC 84.14 (0.5), AUPRC 48.95

Looking forward to fixing the bug.

Hi @Lixinshuai-bit,

Have you tried the commit 5dd0f9a?

If this configuration does not work for you, please let me know. I would appreciate it if you could provide the experimental logging file.

imJiawen commented 3 months ago

Hi @CodeNinjaja,

Thank you for bringing this to our attention.

Could you please try re-downloading the dataset using the latest code and running run.sh again to see if the performance improves? It appears that modifications to the data preprocessing scripts during the open-sourcing phase may have caused the performance discrepancy compared to the paper. However, the model code itself should still be functioning correctly. I will investigate the source of this inconsistency as soon as possible.

CodeNinjaja commented 3 months ago

Thank you for your reply. I noticed that you enlarged the --d_model, --d_inner_hid, and --n_layers. Besides, the shuffle in Dataloader was set to False. I re-downloaded the dataset and ran the script with the updated code. However, the results still do not seem ideal. The log file is attached below. warpformer_logs.zip

imJiawen commented 3 months ago

Hi @Lixinshuai-bit @CodeNinjaja ,

Sorry for the delayed response.

I've been quite busy with other matters and just now had the bandwidth to address the issue.

Could you please re-download the dataset and try reproducing the results again? I believe the problem should be resolved now.

Thank you for your feedback. If you encounter any further issues, please let me know.

CodeNinjaja commented 2 weeks ago

Hi @Lixinshuai-bit @CodeNinjaja ,

Sorry for the delayed response.

I've been quite busy with other matters and just now had the bandwidth to address the issue.

Could you please re-download the dataset and try reproducing the results again? I believe the problem should be resolved now.

Thank you for your feedback. If you encounter any further issues, please let me know.

Sorry for bothering you.

I have followed your suggestion to re-download the dataset and make another attempt. However, I still cannot reproduce the results reported in the paper. Have you been able to reproduce the results on your machine using the updated code in this repository?

The log file is attached below. Looking forward to your reply. warpformer_log.zip

CodeNinjaja commented 2 weeks ago

@lihaha-96 Would you reopen this issue? Thank you.

imJiawen commented 2 weeks ago

@lihaha-96 Would you reopen this issue? Thank you.

Hi, @CodeNinjaja I'll check and cope with this later.

imJiawen commented 2 weeks ago

Hi @CodeNinjaja

It appears that the difference in performance might be due to variations in data splitting.

After re-downloading and processing the PhysioNet dataset from scratch, I compared it with a previously processed dataset that matched the reported experimental results. I find that the primary difference was a change in the order of data samples in ./datasets/physionet/PhysioNet/processed/set-a_0.016.pt, which in turn affected the division of the training and test sets.

I am not entirely certain whether this change in order was caused by systematic factors, the setting of random seeds, or other potential variables.

To address this and ensure consistent results, I’ve updated the get_physionet_data function to manually control the splitting of the training and testing sets. Could you please try running the updated code to see if this resolves the issue on your end?