mcs07 / ChemDataExtractor

Automatically extract chemical information from scientific documents
http://chemdataextractor.org
MIT License
305 stars 113 forks source link

Issue specifying a Reader #11

Closed chemlynx closed 7 years ago

chemlynx commented 7 years ago

I'm following the examples of reading a document into the tool shown here: http://chemdataextractor.org/docs/reading

I am using version 1.2.2 installed using the conda install on linux and working with Python 3.5, I'm testing from a Jupyter Notebook.

If I use the first example on that page to read a locally stored html file, then I successfully read in the HTML and can query the data stored in the doc variable.

If I then try and explicitly call the correct reader RscHtmlReader() I get an error stating that the name is not defined, and the same if I try to specify the AcsHtmlReader().

`NameError Traceback (most recent call last)

in () 1 f = open(//Downloads//chemdataextractor-journal-articles-source//evaluation journal articles//articles//rsc.nj.c5nj01594d.html', 'rb') ----> 2 doc = Document.from_file(f, readers=[AcsHtmlReader()]) NameError: name 'AcsHtmlReader' is not defined ` I was also able to use the command line interface to generate an output json file, I'm not sure if these are generated by successfully calling the correct reader or if the application is just falling back to to the generic html reader. An ideas?
mcs07 commented 7 years ago

Sorry, I left out the necessary imports from the documentation. Before you use the readers, you will need:

from chemdataextractor.reader import AcsHtmlReader, RscHtmlReader
chemlynx commented 7 years ago

Ok I tried that and it worked. Thanks. I did try using from chemdataextractor import *

but hadn't thought to go for a more specific import.