scholarslab / new-salem-pelican

New Salem running on Pelican
1 stars 3 forks source link

Update search corpus #63

Closed jeremyboggs closed 5 years ago

jeremyboggs commented 5 years ago

The search corpus seems incomplete; I can only successfully get records for the years 1692 and 1693.

Related, if folks are going to update content on the site, we'll need some way to help them rebuild the search corpus. It looks like lunrcorpus.py is mean for this, but when I run it, I get the following:

  File "lunrcorpus.py", line 49, in <module>
    doc_text = ''.join(BeautifulSoup(doc_html, "lxml").findAll(
  File "/Users/jeremy/.local/share/virtualenvs/new-salem-pelican-X3l9dCmC/lib/python3.6/site-packages/bs4/__init__.py", line 198, in __init__
    % ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

I've run pipenv update to supposedly get dependencies, but maybe that dependencies list needs to include additional things?

jeremyboggs commented 5 years ago

Related to #61.

shane-et-al commented 5 years ago

LXML dependency fixed by ed6797867fed10a309255224e6fefcb7084848b2