psf / requests

A simple, yet elegant, HTTP library.
https://requests.readthedocs.io/en/latest/
Apache License 2.0
52.17k stars 9.33k forks source link

Crawling a web miss a # in url,and then fail,how to resolve it? #5580

Closed ch-boogeyman closed 4 years ago

ch-boogeyman commented 4 years ago

Summary.

Expected Result

Don't lose the charts after #

Actual Result

The url after # will loss,the server can't get it!!

Reproduction Steps

import requests
import time

![loss](https://user-images.githubusercontent.com/46711557/92478607-f8766d80-f214-11ea-9887-87ac05ad8f32.png)

url = 'https://ifm.zhaobenshu.com/User/user_ifa_LoginCard.ashx?a=[Lib={{gdut}}][OpenId={{}}][PmWebApiProxy={{}}][UrlQsLib={{#}}][Opac={{sulcmis4}}][OpacCaptcha={{0}}][UrlHost={{http://gdut.n1.zhaobenshu.com/}}][CardId={{' + \
req['account'] + '}}][CardPwd={{' + req['pwd'] + \
 '}}][UniSess={{}}][SessLib={{gdut}}][SessFun={{wap}}][SessPrd={{reso}}][CookiesStr={{}}][CaptchaStr={{}}]&x=&y=01&z=&_=' + \
        str(int(round(time.time() * 1000)))
res = requests.get(url)

System Information

$ python -m requests.help
{
  "chardet": {
    "version": "3.0.4"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "2.8"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.8.3"
  },
  "platform": {
    "release": "10",
    "system": "Windows"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.22.0"
  },
  "system_ssl": {
    "version": "1010106f"
  },
  "urllib3": {
    "version": "1.25.8"
  },
  "using_pyopenssl": false
}
sigmavirus24 commented 4 years ago

Hi there! Thanks for opening this issue. Unfortunately, it seems this is a request for help instead of a report of a defect in the project. Please use StackOverflow for general usage questions instead and only report defects here.

ch-boogeyman commented 4 years ago

Hi there! Thanks for opening this issue. Unfortunately, it seems this is a request for help instead of a report of a defect in the project. Please use StackOverflow for general usage questions instead and only report defects here.

No,this is a bug,the url after '#' will loss. For example,requests.get('http://www.xxxx#aaa.com') only requests.get('http://www.xxx')

nateprewitt commented 4 years ago

No,this is a bug,the url after '#' will loss. For example,requests.get('http://www.xxxx#aaa.com') only requests.get('http://www.xxx')

Hi @ch-boogeyman, this appears to be a fundamental misunderstanding of how URI fragments work. Everything after the hash is considered client-side and won't be transmitted to the server. You can find more information here. If you have more questions, please direct them to StackOverflow as requested. Thanks!