scrapy / scrapely

A pure-python HTML screen-scraping library
1.86k stars 272 forks source link

ValueError: Buffer dtype mismatch, expected 'int64_t' but got 'long' i on Windows 10 #118

Open juhacz opened 5 years ago

juhacz commented 5 years ago

I try to using scrapely on Windows 10 computer. I tested it on x32 and x64 python verions (3.7.4). When i try using scrape() i have error

Traceback (most recent call last): File "D:/DEV/peojects_Python/test/test.py", line 28, in print(s.scrape("https://xxxxxx")) File "D:\DEV\peojects_Python\test\venv\lib\site-packages\scrapely__init.py", line 53, in scrape return self.scrape_page(page) File "D:\DEV\peojects_Python\test\venv\lib\site-packages\scrapely__init__.py", line 59, in scrape_page return self._ex.extract(page)[0] File "D:\DEV\peojects_Python\test\venv\lib\site-packages\scrapely\extraction\init__.py", line 119, in extract extracted = extraction_tree.extract(extraction_page) File "D:\DEV\peojects_Python\test\venv\lib\site-packages\scrapely\extraction\regionextract.py", line 575, in extract items.extend(extractor.extract(page, start_index, end_index, self.template.ignored_regions)) File "D:\DEV\peojectsPython\test\venv\lib\site-packages\scrapely\extraction\regionextract.py", line 351, in extract , _, attributes = self._doextract(page, extractors, start_index, end_index, kwargs) File "D:\DEV\peojects_Python\test\venv\lib\site-packages\scrapely\extraction\regionextract.py", line 396, in _doextract labelled, start_index, end_index_exclusive, self.best_match, kwargs) File "D:\DEV\peojects_Python\test\venv\lib\site-packages\scrapely\extraction\similarity.py", line 148, in similar_region data_length - range_end, data_length - range_start) File "D:\DEV\peojects_Python\test\venv\lib\site-packages\scrapely\extraction\similarity.py", line 85, in longest_unique_subsequence matches = naive_match_length(to_search, subsequence, range_start, range_end) File "scrapely/extraction/_similarity.pyx", line 155, in scrapely.extraction._similarity.naive_match_length cpdef naive_match_length(sequence, pattern, int start=0, int end=-1): File "scrapely/extraction/_similarity.pyx", line 158, in scrapely.extraction._similarity.naive_match_length return np_naive_match_length(sequence, pattern, start, end) File "scrapely/extraction/_similarity.pyx", line 87, in scrapely.extraction._similarity.np_naive_match_length cdef np_naive_match_length(np.ndarray[np.int64_t, ndim=1] sequence, ValueError: Buffer dtype mismatch, expected 'int64_t' but got 'long'

I try to run this on VPS Centos 7 and Python 3.6, all working fine. Problem is only on Windows.

Rockyzsu commented 4 years ago

Got the same issue on latest version on win10 In [7]: scrapely.version Out[7]: '0.14.0'