jianlin-cheng / Cryo2Struct

Deep learning tools for converting cryo-EM density maps to protein structures
MIT License
2 stars 1 forks source link

Some EMD IDs in the train split do not exist #2

Open RodenLuo opened 5 months ago

RodenLuo commented 5 months ago

Hi,

I downloaded from here the splits info and here the full dataset.

Some EMD IDs (full list below) exist in the train split but not in the dataset. Did I make any mistakes or those were removed later on?

Thanks

0172
6649
2603
0174
23273
14552
11371
11370
6969
9774
26336
30800
12192
9017
21648
21995
4433
4461
26337
6644
20252
0618
30778
7543
26338
20876
21647
12190
7589
31361
14559
30414
9026
8762
12195
32011
23272
0920
14551
13574
20874
24402
6968
8314
8754
20083
24403
8763
9564
2604
32101
20195
8761
20872
23042
7599
20873
32005
4633
20125
8854
13120
14550
8606
31431
11108
RodenLuo commented 5 months ago

Similarly, for valid:

8661
14553
12350
22931

Also, I could not find any of the test split IDs in the dataset.

nabingiri commented 5 months ago

Hello RodenLuo, I have updated the 'metadata' file here, could you please use them.

RodenLuo commented 4 months ago

Hi Nabin, Sorry for the late reply. Was traveling to several conferences.

The problem still exists on my side. I'm using the previously downloaded EMD folder and the new metadata file. I notice this time that there are two kinds of issues. One is, e.g., "2278" is the first in the TEST tab, but it is not inside the EMD folder. The second is, e.g., "903" is in the VALID tab, but only "0903" is in the EMD folder.

I attached the output of ls EMD > EMD_list.txt and the IDs in each split on my end for your reference.

EMD_list.txt split_valid_new.txt split_train_new.txt split_test_new.txt

nabingiri commented 4 months ago

Hello @RodenLuo,