sustainable-processes / ORDerly

Chemical reaction data & benchmarks. Extraction and cleaning of data from Open Reaction Database (ORD)
MIT License
69 stars 8 forks source link

Sweep: Keep track of index of reactions in dataframe #148

Closed marcosfelt closed 1 year ago

marcosfelt commented 1 year ago

In orderly/clean/cleaner.py, we currently do not store unique identifiers of the reactions from the original dataset.. We'd like to add unique identifiers that are kept throughout the Cleaner, so the final dataframe has these identifiers. Ideally this would be a column with the name "reaction_id" that is added before cleaning starts and is maintained through to the final dataset.

sweep-ai[bot] commented 1 year ago

Hey @marcosfelt, I've started working on this issue. The plan is to add a new column "reaction_id" to the DataFrame in the Cleaner class. This column will serve as a unique identifier for each reaction and will be preserved throughout the cleaning process. Give me a minute!

Some code snippets I looked at (click to expand). If some file is missing from here, you can mention the path in the ticket description. https://github.com/sustainable-processes/ORDerly/blob/5bca7f134f285838a2879d03822abf8ba63c237a/tests/test_clean.py#L392-L506 https://github.com/sustainable-processes/ORDerly/blob/5bca7f134f285838a2879d03822abf8ba63c237a/tests/test_clean.py#L1202-L1223 https://github.com/sustainable-processes/ORDerly/blob/5bca7f134f285838a2879d03822abf8ba63c237a/orderly/clean/cleaner.py#L1-L1372

I'm a bot that handles simple bugs and feature requests but I might make mistakes. Please be kind!

sweep-ai[bot] commented 1 year ago

I'm sorry, but it looks like an error has occured. Try removing and re-adding the sweep label. I'll try again in a minute. If this error persists contact team@sweep.dev.


I'm a bot that handles simple bugs and feature requests but I might make mistakes. Please be kind!

sweep-ai[bot] commented 1 year ago

I'm sorry, but it looks like an error has occured. Try removing and re-adding the sweep label. I'll try again in a minute. If this error persists contact team@sweep.dev.


I'm a bot that handles simple bugs and feature requests but I might make mistakes. Please be kind!

sweep-ai[bot] commented 1 year ago

Hey @marcosfelt, I've started working on this issue. The plan is to add a new column "reaction_id" to the DataFrame in the Cleaner class. This column will serve as a unique identifier for each reaction and will be preserved throughout the cleaning process. Hang tight, I'll have this sorted out in a jiffy!

Some code snippets I looked at (click to expand). If some file is missing from here, you can mention the path in the ticket description. https://github.com/sustainable-processes/ORDerly/blob/5bca7f134f285838a2879d03822abf8ba63c237a/tests/test_clean.py#L392-L506 https://github.com/sustainable-processes/ORDerly/blob/5bca7f134f285838a2879d03822abf8ba63c237a/tests/test_clean.py#L1202-L1223 https://github.com/sustainable-processes/ORDerly/blob/5bca7f134f285838a2879d03822abf8ba63c237a/orderly/clean/cleaner.py#L1-L1372

I'm a bot that handles simple bugs and feature requests but I might make mistakes. Please be kind!

sweep-ai[bot] commented 1 year ago

I'm sorry, but it looks like an error has occured. Try removing and re-adding the sweep label. I'll try again in a minute. If this error persists contact team@sweep.dev.


I'm a bot that handles simple bugs and feature requests but I might make mistakes. Please be kind!

sweep-ai[bot] commented 1 year ago

Hey @marcosfelt, I've started working on this issue. The plan is to add a unique identifier for each reaction in the DataFrame during the cleaning process. This will be done by adding a new column "reaction_id" to the DataFrame right after merging all the data. I'll ensure that this identifier is maintained throughout the cleaning process. Give me a minute!

Some code snippets I looked at (click to expand). If some file is missing from here, you can mention the path in the ticket description. https://github.com/sustainable-processes/ORDerly/blob/5bca7f134f285838a2879d03822abf8ba63c237a/tests/test_clean.py#L392-L506 https://github.com/sustainable-processes/ORDerly/blob/5bca7f134f285838a2879d03822abf8ba63c237a/tests/test_clean.py#L1202-L1223 https://github.com/sustainable-processes/ORDerly/blob/5bca7f134f285838a2879d03822abf8ba63c237a/orderly/clean/cleaner.py#L1-L1372

I'm a bot that handles simple bugs and feature requests but I might make mistakes. Please be kind!

sweep-ai[bot] commented 1 year ago

I'm sorry, but it looks like an error has occured. Try removing and re-adding the sweep label. I'll try again in a minute. If this error persists contact team@sweep.dev.


I'm a bot that handles simple bugs and feature requests but I might make mistakes. Please be kind!

sweep-ai[bot] commented 1 year ago

I'm sorry, but it looks like an error has occured. Try removing and re-adding the sweep label. I'll try again in a minute. If this error persists contact team@sweep.dev.


I'm a bot that handles simple bugs and feature requests but I might make mistakes. Please be kind!

sweep-ai[bot] commented 1 year ago

Hey @marcosfelt, I've started working on this issue. The plan is to add a new column "reaction_id" to the dataframe in the Cleaner class. This column will serve as a unique identifier for each reaction and will be maintained throughout the cleaning process. I'll get this done right away. Give me a minute!

Some code snippets I looked at (click to expand). If some file is missing from here, you can mention the path in the ticket description. https://github.com/sustainable-processes/ORDerly/blob/5bca7f134f285838a2879d03822abf8ba63c237a/tests/test_clean.py#L392-L506 https://github.com/sustainable-processes/ORDerly/blob/5bca7f134f285838a2879d03822abf8ba63c237a/tests/test_clean.py#L1202-L1223 https://github.com/sustainable-processes/ORDerly/blob/5bca7f134f285838a2879d03822abf8ba63c237a/orderly/clean/cleaner.py#L1-L1372

I'm a bot that handles simple bugs and feature requests but I might make mistakes. Please be kind!

sweep-ai[bot] commented 1 year ago

I'm sorry, but it looks like an error has occured. Try removing and re-adding the sweep label. I'll try again in a minute. If this error persists contact team@sweep.dev.


I'm a bot that handles simple bugs and feature requests but I might make mistakes. Please be kind!