wangxr0526 / RetroPrime

Code for Single-step Retrosynthesis model Retroprime
MIT License
34 stars 6 forks source link

How to get the data needed in the new_raw_all.csv file? #4

Closed Arpita33 closed 2 years ago

Arpita33 commented 2 years ago

In the data directory provided, the file retrosim/retrosim/data/get_data.py, which is used to split the data_preprocessed.csv into train, validation and test sets is incomplete. It is generating error for missing function definition. How to get the test, train, validation data for compiling the new_raw_all.csv data?

Even this data: https://raw.githubusercontent.com/connorcoley/retrosim/0a272f0b5de833c448f41491e81e4dc00b4d85b0/retrosim/data/data_processed.csv does not follow the format that retroprime needs.

wangxr0526 commented 2 years ago

I am sorry to reply to the email so late. The current retrosim/retrosim/data/get_data.py can't be used because of the update of Python and Pandas. You can downgrade the python/pandas version or write a processing script according to its code logic. As for the difference between the data format with retroprime you mentioned, just replace the header with pandas. There is a need for some text handling on the script to transform. Thank you for your attention to retroprime.

Arpita33 commented 2 years ago

Thank you for your reply, can you give out the exact train-validation-test sets used for your method?

yuzhou2333 commented 9 months ago

Thank you for your reply, can you give out the exact train-validation-test sets used for your method?

Hello,how do to get the single/raw_train or test or valid.csv? can i just seperate the data_processed.csv to 3 files as 8:1:1 and run the .sh files from 0 to 6?