open-reaction-database / ord-data

Official data repository for the Open Reaction Database
https://open-reaction-database.org
Creative Commons Attribution Share Alike 4.0 International
219 stars 54 forks source link

Cleaning formating issue in the freetext of AZdata #128

Closed bznan closed 2 years ago

bznan commented 2 years ago

Formatting issues previously shown in the free text portion of the AZ dataset due to conversion from html to plain text have been removed. All the reaction ID of 750 data points and dataset ID remains the same.

connorcoley commented 2 years ago

HI @bznan -- this is passing all our checks (once we move the dataset into the correct subfolder within data/), but I had a question about formatting. It looks like the Notes field has most of the changes, but not all of them make sense to me:

Original:

9,9-Dimethyl-4,5-bis(diphenylphosphino)xanthene (441 mg, 0.76 mmol) andnbsp;Tris(dibenzylideneacetone)dipalladium(0) (279 mg, 0.30 mmol) were added to a round bottom flask which was evacuated and flushed with nitrogen 3 times, anisole (16.600 ml) was added and the mixture evacuated and flushed with nitrogen 3 times and then heated to 50C for 10 minutes. Cesium carbonate (6204 mg, 19.04 mmol), methyl 5-amino-4-fluoro-1-methyl-1H-benzo[d]imidazole-6-carboxylate (1700 mg, 7.62 mmol) and Iodobenzene (1.023 ml, 9.14 mmol) were stirred in anisole (33.2 ml), the flask was evacuated and flushed with nitrogen three times and then heated to 50C. The catalyst mixture was transferred into the flask with the reactants (by syringe) and the mixture heated to 100 C for 18 hours. The reaction mixture was diluted with isohexane (60 ml) and the solid collected by filtration and washed with more isohexane (30 ml). The solid was slurried in DCM MeOH and adsorbed onto silica then purified by flash silica ch eluting with 2% 3.7N NH3 MeOH in DCM to give methyl 4-fluoro-1-methyl-5-(phenylamino)-1H-benzo[d]imidazole-6-carboxylate (1310 mg, 57.5 %) as anbsp;yellow solid.

New

9,9-Dimethyl-4,5-bis(diphenylphosphino)xanthene (441 mg, 0.76 mmol) andTris(dibenzylideneacetone)dipalladium(0) (279 mg, 0.30 mmol) were added to around bottom flask which was evacuated and flushed with nitrogen 3 times,anisole (16.600 ml) was added and the mixture evacuated and flushed withnitrogen 3 times and then heated to 50ºC for 10 minutes. Cesium carbonate(6204 mg, 19.04 mmol), methyl5-amino-4-fluoro-1-methyl-1H-benzo[d]imidazole-6-carboxylate (1700 mg, 7.62mmol) and Iodobenzene (1.023 ml, 9.14 mmol) were stirred in anisole (33.2 ml),the flask was evacuated and flushed with nitrogen three times and then heatedto 50ºC. The catalyst mixture was transferred into the flask with thereactants (by syringe) and the mixture heated to 100 °C for 18 hours. Thereaction mixture was diluted with isohexane (60 ml) and the solid collected byfiltration and washed with more isohexane (30 ml). The solid was slurried inDCM / MeOH and adsorbed onto silica then purified by flash silicachromatography eluting with 2% 3.7N NH3 / MeOH in DCM to give methyl4-fluoro-1-methyl-5-(phenylamino)-1H-benzo[d]imidazole-6-carboxylate (1310 mg,57.5 %) as a yellow solid.

(this is reaction #1). There are some clear improvements in the new text, but there are also some missing spaces. Do you know why some of these spaces are gone in the new version?

bznan commented 2 years ago

Hi,

Thanks for mentioning, the missing space is where the new line starts. I have fixed that issue and started a new pull request.

Sincerely, Bozhao

On Wed, Mar 2, 2022 at 5:29 PM Connor Coley @.***> wrote:

HI @bznan https://github.com/bznan -- this is passing all our checks (once we move the dataset into the correct subfolder within data/), but I had a question about formatting. It looks like the Notes field has most of the changes, but not all of them make sense to me:

Original:

9,9-Dimethyl-4,5-bis(diphenylphosphino)xanthene (441 mg, 0.76 mmol) andnbsp;Tris(dibenzylideneacetone)dipalladium(0) (279 mg, 0.30 mmol) were added to a round bottom flask which was evacuated and flushed with nitrogen 3 times, anisole (16.600 ml) was added and the mixture evacuated and flushed with nitrogen 3 times and then heated to 50C for 10 minutes. Cesium carbonate (6204 mg, 19.04 mmol), methyl 5-amino-4-fluoro-1-methyl-1H-benzo[d]imidazole-6-carboxylate (1700 mg, 7.62 mmol) and Iodobenzene (1.023 ml, 9.14 mmol) were stirred in anisole (33.2 ml), the flask was evacuated and flushed with nitrogen three times and then heated to 50C. The catalyst mixture was transferred into the flask with the reactants (by syringe) and the mixture heated to 100 C for 18 hours. The reaction mixture was diluted with isohexane (60 ml) and the solid collected by filtration and washed with more isohexane (30 ml). The solid was slurried in DCM MeOH and adsorbed onto silica then purified by flash silica ch eluting with 2% 3.7N NH3 MeOH in DCM to give methyl 4-fluoro-1-methyl-5-(phenylamino)-1H-benzo[d]imidazole-6-carboxylate (1310 mg, 57.5 %) as anbsp;yellow solid.

New

9,9-Dimethyl-4,5-bis(diphenylphosphino)xanthene (441 mg, 0.76 mmol) andTris(dibenzylideneacetone)dipalladium(0) (279 mg, 0.30 mmol) were added to around bottom flask which was evacuated and flushed with nitrogen 3 times,anisole (16.600 ml) was added and the mixture evacuated and flushed withnitrogen 3 times and then heated to 50ºC for 10 minutes. Cesium carbonate(6204 mg, 19.04 mmol), methyl5-amino-4-fluoro-1-methyl-1H-benzo[d]imidazole-6-carboxylate (1700 mg, 7.62mmol) and Iodobenzene (1.023 ml, 9.14 mmol) were stirred in anisole (33.2 ml),the flask was evacuated and flushed with nitrogen three times and then heatedto 50ºC. The catalyst mixture was transferred into the flask with thereactants (by syringe) and the mixture heated to 100 °C for 18 hours. Thereaction mixture was diluted with isohexane (60 ml) and the solid collected byfiltration and washed with more isohexane (30 ml). The solid was slurried inDCM / MeOH and adsorbed onto silica then purified by flash silicachromatography eluting with 2% 3.7N NH3 / MeOH in DCM to give methyl4-fluoro-1-methyl-5-(phenylamino)-1H-benzo[d]imidazole-6-carboxylate (1310 mg,57.5 %) as a yellow solid.

(this is reaction #1 https://github.com/open-reaction-database/ord-data/pull/1). There are some clear improvements in the new text, but there are also some missing spaces. Do you know why some of these spaces are gone in the new version?

— Reply to this email directly, view it on GitHub https://github.com/open-reaction-database/ord-data/pull/128#issuecomment-1057458557, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANUEEUN7KE5Y745R5PXFEHDU57TUFANCNFSM5PALXEXA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>