petermr / CEVOpen

Contentmining of Open phytochemical literature for medicinal activities
26 stars 19 forks source link

NLP extraction of terms #33

Open petermr opened 4 years ago

petermr commented 4 years ago

The very formulaic language

<result pre="chromatography-mass spectrometry. Samples were analyzed by gas chromatography using a" exact="HP6890" post="instrument coupled with a HP 5973 mass spectrometer. 

is effectively a Hearst pattern

 using a" exact="Foo" post="instrument ...

The Stanford NLP group has a tool (SPIED) that uses seeds (e.g. instrument names) to detect the context language and use it to identify new instruments.

@Tiago this looks like a good thing to try.

lubianat commented 4 years ago

It indeed looks like an interesting approach.

I will take a look at the SPIED tool and learn how to use it. It seems a good continuation of where things stopped at the sprint.

petermr commented 4 years ago

Good I ran their demo example so it should be easy to use our own material

On Mon, 7 Oct 2019, 22:14 Tiago Lubiana, notifications@github.com wrote:

It indeed looks like an interesting approach.

I will take a look at the SPIED tool and learn how to use it. It seems a good continuation of where things stopped at the sprint.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/33?email_source=notifications&email_token=AAFTCS6QPWJ6S3XIBCVZC23QNORCPA5CNFSM4I6CV2IKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAR2MDI#issuecomment-539207181, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS3IXGGRSK7PKXOWM33QNORCPANCNFSM4I6CV2IA .