Open aceri opened 7 years ago
I got the same error running the example code:
from scrapely import Scraper
s = Scraper()
url1 = 'http://pypi.python.org/pypi/w3lib/1.1'
data = {'name': 'w3lib 1.1', 'author': 'Scrapy project', 'description': 'Library of web-related functions'}
s.train(url1, data)
url2 = 'http://pypi.python.org/pypi/Django/1.3'
s.scrape(url2)
Gives me the same error.
@aceri, @aschi2 I'm unable to replicate the issue. I guess both of you are using 32 bit systems and that is causing problems. If you can confirm you are using 32 bit systems I can add a fallback to just use the python implementation on 32 bit systems
I am using a 64bit system and 64bit Python 2.7.
I get the exact same error, 64 bit system.
I can't replicate the issue as well. @ruairif I have some doubts in the six library
This is the code for finding the maxsize
class X(object):
def __len__(self):
return 1 << 31
try:
len(X())
except OverflowError:
# 32-bit
MAXSIZE = int((1 << 31) - 1)
else:
# 64-bit
MAXSIZE = int((1 << 63) - 1)
del X
According to me in def __len__(self)
return value should be 1 << 63
If this is valid could this be a source of the problem?
I am also facing the same problem on Python 3.5 64bit Windows!
I have same issue on Python 2.7.11 MSC v.1500 64 bit (AMD64) on win32 under virtual environment. No answers yet?
I've same problem with Python 3.6.3 32bit on windwos 10 Enterprise X64
I got the same problem on Python 2.7.13 64 bit in both System wide and under virtual environment, Windows 10 Home.
The same (similar?) bug here. Python 2.7.14 as venv, MacOS High Sierra.
ValueError: Buffer dtype mismatch, expected 'int64_t' but got 'double'
@ruairif It may be hard to reproduce because a bug is pretty rare. It's present in only 2% of my tests. It occurred 5 times, total 203 trials.
I am getting this error consistently, regardless of input data. Even the small example on the front page of scrapely's github, that illustrates how to scrape pypi, fails with this error.
Python 3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 18:41:36) [MSC v.1900 64 bit (AMD64)] Windows 7, 64-bit.
numpy (1.14.0) pip (9.0.1) scrapely (0.13.4) setuptools (28.8.0) six (1.11.0) w3lib (1.18.0)
Hi @bitblomster, I'm too. Just in Windows. I've no issue with scrapely on Ubuntu.
But something interesting happened. I copied scrapely folder from my Ubuntu Python environment (in site packages) into my Windows, at the same folder with my project that using scrapely. All issue is gone, scrapely working properly afther this. @ruairif , may something missing on scrapely on Windows?
I keep getting the same error in Windows whenever I try to scrape a website (using the API as well as using the command line):
Python 3.6.3 |Anaconda custom (64-bit)| (default, Oct 15 2017, 03:27:45) [MSC v.1900 64 bit (AMD64)] on win32
[...]
File "scrapely/extraction/_similarity.pyx", line 155, in scrapely.extraction._similarity.naive_match_length (scrapely/extraction/_similarity.c:3845)
cpdef naive_match_length(sequence, pattern, int start=0, int end=-1):
File "scrapely/extraction/_similarity.pyx", line 158, in scrapely.extraction._similarity.naive_match_length (scrapely/extraction/_similarity.c:3648)
return np_naive_match_length(sequence, pattern, start, end)
File "scrapely/extraction/_similarity.pyx", line 87, in scrapely.extraction._similarity.np_naive_match_length (scrapely/extraction/_similarity.c:2802)
cdef np_naive_match_length(np.ndarray[np.int64_t, ndim=1] sequence,
ValueError: Buffer dtype mismatch, expected 'int64_t' but got 'long'
I've managed to try it on Ubuntu with another computer: it works, no issue found when scraping. I tried to copy the Ubuntu scrapely folder to Windows, as @hiadore suggested, but I'm still finding the same exact error. I have no clue!
I also have exactly the same problem on Windows 10. Any workarounds?
@ramedey same issue here, but I'm having initial success with running scrapely with https://docs.microsoft.com/en-us/windows/wsl/about (example from readme works :) )
I have the same issue. The problem lies with numpy (scrapely dependency) and how it treats int on a 32bit and 64bit windows system differently.
Any workarounds on this issue?
Hi, I am having the following problem. Not sure if i am following the right steps. This is the repro. Regards,