text = """
<html>
<body>
<table>
<tr><td>p1</td><th>v1</th></tr>
<tr><td>p2</td><td>v2</td></tr>
<tr><td>p3</td><td>v3</td></tr>
</table>
</body>
</html>
"""
from mdr import MDR
mdr = MDR()
candidates, doc = mdr.list_candidates(text)
print([doc.getpath(c) for c in candidates])
print(mdr.extract(candidates[0]))
results in this exception:
$ python test.py
['/html/body/table/tr[1]/th', '/html/body/table']
Traceback (most recent call last):
File "test.py", line 17, in <module>
print(mdr.extract(candidates[0]))
File "build/bdist.macosx-10.12-x86_64/egg/mdr/mdr.py", line 134, in extract
File "build/bdist.macosx-10.12-x86_64/egg/mdr/mdr.py", line 167, in hcluster
File "/Users/david/.virtualenvs/py2-data/lib/python2.7/site-packages/scipy/cluster/hierarchy.py", line 660, in linkage
n = int(distance.num_obs_y(y))
File "/Users/david/.virtualenvs/py2-data/lib/python2.7/site-packages/scipy/spatial/distance.py", line 1718, in num_obs_y
raise ValueError("The number of observations cannot be determined on "
ValueError: The number of observations cannot be determined on an empty distance matrix.
this simple example
test.py
:results in this exception:
Any idea?