I ran into a minor issue with the way you provide data. The documentation does not say you can't provide integer values, so I ended up providing this data:
In [1]: from scrapely import Scraper
In [2]: s = Scraper()
In [3]: data = {'name': 'scrapy/scrapely', 'url': 'https://github.com/scrapy/scrapely', 'description': 'A pure-python HTML screen-scraping library', 'watchers': 42, 'forks': 9}
In [4]: url = "https://github.com/scrapy/scrapely"
and ran into this exception:
In [5]: s.train(url, data)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
...
/home/ubuntu/scrapely/scrapely/template.py in func(fragment, page)
93 def func(fragment, page):
94 fdata = page.fragment_data(fragment).strip()
---> 95 if text in fdata:
96 return float(len(text)) / len(fdata) - (1e-6 * fragment.start)
97 else:
TypeError: 'in <string>' requires string as left operand
It took me a while to realize what the issue was, it was with the integer values in the data variable.
So, you can either make it all unicode string:
if unicode(text) in fdata:
return float(len(unicode(text))) / len(fdata) - (1e-6 * fragment.start)
or specify in the documentation that values should all be strings.
Amazing work! This is really useful.
I ran into a minor issue with the way you provide data. The documentation does not say you can't provide integer values, so I ended up providing this data:
and ran into this exception:
It took me a while to realize what the issue was, it was with the integer values in the data variable.
So, you can either make it all unicode string:
or specify in the documentation that values should all be strings.