scrapinghub / mdr

A python library detect and extract listing data from HTML page.
109 stars 29 forks source link

ValueError: The number of observations cannot be determined on an empty distance matrix. #7

Open dportabella opened 7 years ago

dportabella commented 7 years ago

this simple example test.py:

text = """
<html>
<body>
    <table>
        <tr><td>p1</td><th>v1</th></tr>
        <tr><td>p2</td><td>v2</td></tr>
        <tr><td>p3</td><td>v3</td></tr>
    </table>
</body>
</html>
"""

from mdr import MDR
mdr = MDR()
candidates, doc = mdr.list_candidates(text)
print([doc.getpath(c) for c in candidates])
print(mdr.extract(candidates[0]))

results in this exception:

$ python test.py
['/html/body/table/tr[1]/th', '/html/body/table']
Traceback (most recent call last):
  File "test.py", line 17, in <module>
    print(mdr.extract(candidates[0]))
  File "build/bdist.macosx-10.12-x86_64/egg/mdr/mdr.py", line 134, in extract
  File "build/bdist.macosx-10.12-x86_64/egg/mdr/mdr.py", line 167, in hcluster
  File "/Users/david/.virtualenvs/py2-data/lib/python2.7/site-packages/scipy/cluster/hierarchy.py", line 660, in linkage
    n = int(distance.num_obs_y(y))
  File "/Users/david/.virtualenvs/py2-data/lib/python2.7/site-packages/scipy/spatial/distance.py", line 1718, in num_obs_y
    raise ValueError("The number of observations cannot be determined on "
ValueError: The number of observations cannot be determined on an empty distance matrix.

Any idea?