kaist-amsg / Composition-Conditioned-Crystal-GAN

Composition-Conditioned Crystal GAN pytorch code
41 stars 18 forks source link

On the problems for the training data. #1

Closed okunoyukihiro2 closed 2 years ago

okunoyukihiro2 commented 4 years ago

Dear Composition-Conditoned-Crystal Gan developers.

I'm now trying to run your github code of your paper 'Generative Adversarial Networks for Crystal Structure Prediction'

In order to run the train.py , it needs the training data mgmno_2000.pickle and I cannot found the training data on the github. Then, I had make the training data by run 1) 5.make_comp_dict.py and 2) 6.data_augmentation_mgmno.py from 'unique_sc_mnmno.npy' and 'unique_sc_mgmno_name_list' on the github.

However, the train.py code did not work with the generated training data.

I see the code of train.py then, found the problems below,

1)
The training data are assumed to be packed with crystal images and labels in the code like,

for j, (imgs, label) in enumerate(dataloader): batch_size = img.shape[0] real_imgs = img.view(batch_size,1,30,3)

I think it assumes images (denoted as C in your paper : the representation of the crystal structure) and labels (denoted as A in your paper, the atomic status ) for the training data.

However, in the codes of data preparation (6.data_augmentation_mgmno.py) the only images data is dumped. Atom status is not generated from the scripts for data preparation on the github.

2)
On the S.I of your paper, the loss function of the classifier are written as

L_class_comp = CE(C_real,\hat(C_real) + lamba1 CE(C_gen, \hat(C_gen)) L_class_atm = CE(A_real,\hat(A_real) + lamba1 CE(A_gen, \hat(A_gen)) L_class = L_class_atom + lambdaC * L_class_comp

On the other hand, in the code (train.py)

cat_loss_real = 0.3*(cat_loss_mg_real + cat_loss_mn_real + cat_loss_o_real) + cat_loss_mg_real2+cat_loss_mn_real2 + cat_loss_o_real

it seems L_class is given like

L_class = L_class_comp + 0.3 * L_class_atom

Furthermore, in the code (train.py), it does not considered the loss function term

CE(Agen,\hat(A_gen) . In the code of train.,py

fake_mg_label fake_mn_label, fale_o_label, fake_mg_cat, ,,,,, = net_Q(fake)

fake_mg_label, fake_mn_label, fake_o_label are not used in the code, and not implemented the

loss function term CE(A_gen, \hat(A_gen)) .

If I can get your reply, I'm very happy.

Sincerely,

Yukihiro Okuno.

1098994933 commented 3 years ago

So much hard code and no comments make the code difficult to use.

sgbaird commented 2 years ago

@jhwann @syaym any update on this? Could really use some instructions and on your part make sure someone can actually run the code per the instructions without error (e.g. test by downloading a fresh copy from GitHub in a fresh conda environment).

syaym commented 2 years ago

@okunoyukihiro2

Regarding the training data, note that we have a separate routine for data pretreatments due to the capacity limit of this site to upload the entire data. I think you are likely running into a problem since you probably have not run that code needed for data pretreatments. So, I suggest you to run "5.make_comp_dict.py" ~ "7.make_label.py" in the preparing_dataset folder. Or, we now have uploaded the full data in a different website where you can download the trainable data already pretreated at https://figshare.com/s/0dce6bb830ae1e392206.

For the second question regarding the loss function of the classifier, the general form of L_class_atom has the lambda2*CE(A_gen, \hat(A_gen)) term, to be symmetric with L_class_comp, but the final value of lambda2 we used in the end is 0 instead of 1 to ensure structural diversity, and this is why the CE(Agen,\hat(A_gen)) term does not exist in the original code. Thus, in short, the hyperparameter for lambda2 in Table S1 in the SI file was a typo, and this correction is now in progress with the journal and will be updated shortly.

Many thanks for your comments.

(PS. The above reply is an edited/corrected version of my earlier response.)

sgbaird commented 2 years ago

@syaym thank you for the response. I plan to give the code a try. Perhaps you could upload a copy of mgmno_2000.pickle to figshare, assuming it it less than 20 GB, and then include the link in the README?

Z-Abbas commented 2 years ago

Anyone, please guide, from where we can get the ".cif" and ".vasp" files? @syaym are you going to upload the mgmno_2000.pickle file?

syaym commented 2 years ago

@sgbaird @Z-Abbas I uploaded the mgmno_2000.pickle file.

sgbaird commented 2 years ago

@syaym Wonderful. Thank you!

I see that you added the link to the README https://figshare.com/s/0dce6bb830ae1e392206

Z-Abbas commented 2 years ago

@syaym Thank you for the file. I am now able to run the "train.py" file after preparing the dataset by running "5.make_comp_dict.py" ~ "7.make_label.py". After running "train.py", it creates two folders; 1.model_cwgan_mgmno and 2. gen_image_cwgan_mgmno". The 2nd one contains the npy files which look like the screenshot attached. Are these the x,y, and z-axis? How can I see the newly generated crystal-structues? Would appreciate your earliest response. Capture

sgbaird commented 2 years ago

@Z-Abbas you may consider emailing @syaym

syaym commented 2 years ago

You can convert the npy data to atoms object (ase) by using view_atoms_mgmno.py 

-----Original Message----- From: "Z-Abbas" @.> To: "kaist-amsg/Composition-Conditioned-Crystal-GAN" @.>; Cc: "syaym" @.>; "Mention" @.>; Sent: 2022-02-02 (수) 12:04:28 (UTC+09:00) Subject: Re: [kaist-amsg/Composition-Conditioned-Crystal-GAN] On the problems for the training data. (#1)

@syaym https://github.com/syaym Thank you for the file.I am now able to run the "train.py" file after preparing the dataset by running "5.make_comp_dict.py" ~ "7.make_label.py". After running "train.py", it creates two folders; 1.model_cwgan_mgmno and 2. gen_image_cwgan_mgmno". The 2nd one contains the npy files which look like the screenshot attached.Are these the x,y, and z-axis? How can I see the newly generated crystal-structues?Would appreciate your earliest response. https://user-images.githubusercontent.com/80881943/152086930-cf54c1f0-8ed3-4ef8-988d-44afa9ab6baf.PNG —Reply to this email directly, view it on GitHub https://github.com/kaist-amsg/Composition-Conditioned-Crystal-GAN/issues/1#issuecomment-1027530708, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJGP4QHBLQYLK2G2QJH6METUZCNLTANCNFSM4PKKBUJA.Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign=notification-email&utm_medium=email&utm_source=github.You are receiving this because you were mentioned.Message ID: @.***>

Z-Abbas commented 2 years ago

@syaym Would you please guide me, how can I get the ".vasp" files for the below list? vasp_list = glob.glob(vasp_path+'/*.vasp')

syaym commented 2 years ago

@Z-Abbas In "preparing_dataset" folder, please run "5.make_comp_dict.py" ~ "7.make_label.py" for making data-augmented mgmno dataset. We provided 'unique_sc_mgmno.npy' instead of vasp files.

Z-Abbas commented 2 years ago

@syaym Thank you for your prompt response. I already have run 5~7. I just want to follow it from the start by importing the cif and vasp files. That's why I am looking for it. I already found the cif files but unable to find the vasp files.

Z-Abbas commented 2 years ago

@syaym I would appreciate it if it's possible for you to share the vasp files.

syaym commented 2 years ago

@Z-Abbas https://figshare.com/s/350a8ac4732de2da3a00

Z-Abbas commented 2 years ago

Much appreciated! Thank you!

sgbaird commented 2 years ago

@Z-Abbas would be interested to hear back once you're able to get it running from start to finish using the VASP files

Z-Abbas commented 2 years ago

@sgbaird sure :)

Z-Abbas commented 2 years ago

Hello! @syaym and @sgbaird I have successfully run it from start to end using the VASP files. At the end it generates "gen_images_x.npy" files. Using this npy file (any single npy file) I run the below function inside view_atoms_mgmno.py; def view_atoms_classifier(image,mg_label,mn_label, o_label, view=True):

When I print the atom, it gives me the output as: Atoms(symbols='Mg4Mn4O', pbc=True, cell=[[6.622104644775391, 0.0, 0.0], [0.9188044602466798, 10.297901251162942, 0.0], [4.528352067461127, -7.3943218741460095, 15.134449311591837]])

and when I do "atoms.edit()" it generates the image in gui as attached. Mg4Mn4O

@syaym Now I am wondering, how to check the validity of the generated atoms and convert it in structural form?

syaym commented 2 years ago

@Z-Abbas I think that the generator is not fully trained enough for generating structures and you should run more training epochs.

Z-Abbas commented 2 years ago

@syaym Thank you! and yes I reduced the n_epochs to 300 from 501(in code), and constraint_epoch to 5000(10000 in code). If I use the same epochs as in your code, will it generate the structures similar as given in paper?

And, am I correctly viewing the atoms using the "def view_atoms_classifier(image,mg_label,mn_label, o_label, view=True):" function?

syaym commented 2 years ago

@Z-Abbas the structures in the paper were post-processed with DFT-optimization. Generated structures that have not undergone post-process may have the form of somewhat weird structures. And, It is right to use "view_atoms_classifier "

Z-Abbas commented 2 years ago

@syaym Thank you so much! What about checking the validity of the generated molecules?