Open Plusarc opened 2 weeks ago
same issue
I can't reproduce this issue, things work as usual for me. Is this error persistent for you, or is it intermittent?
If there has been a change in Github policy that caused this, I need to find alternatives for hosting the file - spoofing the user agent for Github is not a solution.
Hi, Polum. Thanks for looking into this. In the aws ec2 instance, it is persistent error. In my local, it works fine so far.
So I doubt github may have new policy to detect the bot/crawler from aws ec2 instance. Yeah, it would be great to have an alternatives for hosting the file. Thanks.
Thank you for the extra information that this is on an EC2 instance, that makes sense. I can definitely add a parameter to specify a local file or separate URL.
It might take me a little while to implement this, but I would also be happy to accept a PR.
As a short-term workaround, besides changing your headers, you can change the download URL in the source in your local installation, or rewrite the function where it's used.
i decided to detour and download directly
version and url are from https://raw.githubusercontent.com/polm/unidic-py/master/dicts.json
stated in the code as polm mentinoed
python -c "import urllib.request; import unidic; import os; from unidic.download import download_and_clean; opener = urllib.request.build_opener(); opener.addheaders = [('User-Agent', 'Mozilla/5.0')]; urllib.request.install_opener(opener); download_and_clean('3.1.0+2021-08-31', 'https://cotonoha-dic.s3-ap-northeast-1.amazonaws.com/unidic-3.1.0.zip')"
curl https://raw.githubusercontent.com/polm/unidic-py/master/dicts.json
can get the response correctly. Changed the header of request and it worked, so maybe github is blocking request by headerWe may need to update the request header to avoid being blocked by github. Thanks.