mcs07 / ChemDataExtractor

Automatically extract chemical information from scientific documents
http://chemdataextractor.org
MIT License
287 stars 112 forks source link

Fix tokenizer problem with commas inside NMR peaks #18

Open JeffersonH44 opened 6 years ago

JeffersonH44 commented 6 years ago

Hello,

I found a problem while you are reading NMR peaks when you have no whitespace between two elements, like this:

9.73 (s,1H)

This is parsed as a peak with an assignment instead of a peak with a multiplicity and a number, I also add some cases for other contexts when you have some coupling units.

mcs07 commented 6 years ago

Sorry I've taken so long to review this - there's some great stuff here. But could you create separate branches for each individual change so they can have separate, more focused Pull Requests?

That way you can also continue developing on your master branch and it won't appear in the pull request. I think you'll need to create a new branch from the last commit before your changes, then you can cherry pick some of your own commits into the branch.

Some help documentation here: https://help.github.com/categories/collaborating-with-issues-and-pull-requests/ https://git-scm.com/docs/git-cherry-pick

JeffersonH44 commented 6 years ago

Ok, I'll do that