mikevoets / jama16-retina-replication

JAMA 2016; 316(22) Replication Study
https://doi.org/10.1371/journal.pone.0217541
MIT License
110 stars 37 forks source link

about data #14

Closed mrzhangzizhen123 closed 4 years ago

mrzhangzizhen123 commented 5 years ago

When I was running your code, I found that the data didn't match. Could you explain it to me?in eyepacs.sh. bin2_0_cnt=48784 bin2_0_tr_cnt=40688 bin2_1_tr_cnt=16458 What is the relationship between 40688 and 48784?

mikevoets commented 5 years ago

bin2_0_cnt is the number of non-rDR images for both training and test. bin2_0_tr_cnt is the number of non-rDR images for training only.

mrzhangzizhen123 commented 5 years ago

There is one more question that I need to ask you: when you were training, what was the value of AUC shown in each epoch?

mikevoets commented 5 years ago

Sorry, I can't help you with that. We didn't keep logs of AUC during training. The AUC also varies for each trained model, so there is no definite answer to what AUC should be after each epoch.

mrzhangzizhen123 commented 5 years ago

Hello, there is a problem in your experiment.A total of 88702.Among them, there are 65345 pictures in the 0 label, 6205 in the 1, 13153 in the 2, 2085 in the 3, and 1914 in the 4. When I executed eyepacs.sh --redistribute, I found 40374 pictures in the 0 and 16458 pictures in the 1, which did not conform to bin_0_tr_cnt=40688. May I ask how to solve this problem?

mikevoets commented 5 years ago

@mrzhangzizhen123 I am not able to reproduce your results. I always end up with 40688 images in the 0-folder, and 16458 images in the 1-folder, as expected. Have you modified anything in the eyepacs.sh script?

mrzhangzizhen123 commented 5 years ago

I have modified bin_0_cnt and so on according to the existing data. Other places have not moved. Now when running train.py, the first column of the 2*2 matrix is all 0.Is the data not well distributed?

mikevoets commented 5 years ago

@mrzhangzizhen123 What are the values of the modified parameters in eyepacs.sh?

mrzhangzizhen123 commented 5 years ago

This is what I did. I separated the left and right eyes in eyepacs and trained the left eye and the right eye separately, so I changed the quantity in eyepacs.sh, which may be wrong. Do you have any Suggestions?------------------ 原始邮件 ------------------ 发件人: "Mike Voets"notifications@github.com 发送时间: 2019年7月25日(星期四) 中午12:31 收件人: "mikevoets/jama16-retina-replication"jama16-retina-replication@noreply.github.com; 抄送: "mrzhangzizhen123"639521600@qq.com;"Mention"mention@noreply.github.com; 主题: Re: [mikevoets/jama16-retina-replication] about data (#14)

@mrzhangzizhen123 What are the values of the modified parameters in eyepacs.sh?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

mikevoets commented 5 years ago

bin_0_cnt represents the total amount of images used from the EyePacs data set (retrieved from the pool). bin_0_tr_cnt represents how many of these images are used for training. If you have modified the amount of images in the data set by separating the images for left and right eyes, these numbers will be different. The eyepacs.sh script is written specifically to fit the needs in our project. It yields an equal ratio of rDR/non-rDR images as in the original JAMA project.

When you modify the data set, in your case by separating the images for left and right eyes, I do not think eyepacs.sh will be useful for you. Please read this section in our README to preprocess your data sets correctly to be used with train.py.

mrzhangzizhen123 commented 5 years ago

I have a question for you. If you train your own data set, how can you have a verification set like the original code?

------------------ 原始邮件 ------------------ 发件人: "Mike Voets"notifications@github.com; 发送时间: 2019年7月27日(星期六) 上午10:25 收件人: "mikevoets/jama16-retina-replication"jama16-retina-replication@noreply.github.com; 抄送: "╭學絵や紾悕"639521600@qq.com;"Mention"mention@noreply.github.com; 主题: Re: [mikevoets/jama16-retina-replication] about data (#14)

bin_0_cnt represents the total amount of images used from the EyePacs data set (retrieved from the pool). bin_0_tr_cnt represents how many of these images are used for training. If you have modified the amount of images in the data set by separating the images for left and right eyes, these numbers will be different. The eyepacs.sh script is written specifically to fit the needs in our project. It yields an equal ratio of rDR/non-rDR images as in the original JAMA project.

When you modify the data set, in your case by separating the images for left and right eyes, I do not think eyepacs.sh will be useful for you. Please read this section in our README to preprocess your data sets correctly to be used with train.py.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

mikevoets commented 5 years ago

Yes. You can use the Messidor-2 or Messidor-Original test sets, and run evaluate.py with the -m2 og -m parameter, respectively. If you want to use our EyePacs test set, it requires unpacking the training set too. The evaluation script loads the trained model and expects its architecture to be the same as built in the train script. Evaluating with the original EyePacs test data is done with the following steps:

  1. Run the eyepacs.sh script and use --output_dir ./eyepacs.sh --output_dir=./eyepacs_data
  2. Run the evaluate.py script with your trained model: python evaluate.py -e -lm=/your_model --data_dir=./eyepacs_data/test