juancarlospaco / faster-than-requests

Faster requests on Python 3
https://gist.github.com/juancarlospaco/37da34ed13a609663f55f4466c4dbc3e
MIT License
1.11k stars 90 forks source link

Documentation #125

Closed AlexRMU closed 3 years ago

AlexRMU commented 3 years ago

There are problems with the documentation

requests.scraper7("http://python.org", "body > div.someclass a#someid"])    # CSS Selector Web Scraper
...
requests.scraper2(["https://nim-lang.org", "http://example.com"], list_of_tags=["h1", "h2"], case_insensitive=False)

This does not work, you need to replace it with

requests.scraper7("http://python.org", "body > div.someclass a#someid")    # CSS Selector Web Scraper
...
requests.scraper2(list_of_urls=["https://nim-lang.org", "http://example.com"], list_of_tags=["h1", "h2"], case_insensitive=False)

And about the scraper 2(). What do you mean "It returns the Scraped Webs"? He gave it back to me [[<capsule object NULL at 0x00000214F58DCEA0>, <capsule object NULL at 0x00000214F58DCED0>, <capsule object NULL at 0x00000214F58DCF00>, <capsule object NULL at 0x00000214F58DCA80>, <capsule object NULL at 0x00000214F58DCAB0>, <capsule object NULL at 0x00000214F58DCA50>, <capsule object NULL at 0x00000214F58DC2A0>... And what to do about it?


And about the scraper().

res = requests.scraper(list_of_urls=["http://example.com"], html_tag="div", threads=True)
res = requests.scraper(list_of_urls=["http://example.com"], html_tag="div", threads=False)
res = requests.scraper(list_of_urls=["http://example.com"], threads=True)
res = requests.scraper(list_of_urls=["http://example.com"], threads=False)

Returns nothing. Generally. Why?

juancarlospaco commented 3 years ago

Please provide the full source code, is not possible to troubleshoot without reproducing the bug.

AlexRMU commented 3 years ago

All code provided earlier

scraper2() requests.scraper2(list_of_urls=["https://nim-lang.org", "http://example.com"], list_of_tags=["h1", "h2"], case_insensitive=False) scraper()

res = requests.scraper(list_of_urls=["http://example.com"], html_tag="div", threads=True)
res = requests.scraper(list_of_urls=["http://example.com"], html_tag="div", threads=False)
res = requests.scraper(list_of_urls=["http://example.com"], threads=True)
res = requests.scraper(list_of_urls=["http://example.com"], threads=False)
AlexRMU commented 3 years ago

This is not my code, but the code from the examples. And it doesn't work. And some examples.py doesn't work either. Yeah.

AlexRMU commented 3 years ago

And the question about cookies. There is a website http://httpbin.org/cookies, it outputs the cookies sent to it. Everything works with requests, but nothing works with get(), head(), set_headers(). Please give a working example.

AlexRMU commented 3 years ago

What's up?

juancarlospaco commented 3 years ago

Scraper2 is now fixed.

About the first point, you have to use the argument with a name if Python gets confused, is a feature that Python do not have, let me show you a tiny example to explain it better:

More fixes coming soon...

juancarlospaco commented 3 years ago

Documentation added, it is working:

$ cat prueba.py 
import faster_than_requests
faster_than_requests.init_client()
print(
  faster_than_requests.scraper(
    list_of_urls = ["http://nim-lang.org"], 
    html_tag = "a",
    case_insensitive = False,
    deduplicate_urls = True,
  )
)
faster_than_requests.close_client()

$ python prueba.py 

['@[<a class="pure-menu-heading pure-menu-link site-logo-container" href="/">\n        <img src="/assets/img/logo.svg" class="site-logo" alt="Nim" height="28" />\n      </a>, <a class="pure-menu-link" href="/blog.html">\n            Blog\n          </a>, <a class="pure-menu-link" href="/features.html">\n            Features\n          </a>, <a class="pure-menu-link" href="/install.html">\n            Download\n          </a>, <a class="pure-menu-link" href="/learn.html">\n            Learn\n          </a>, <a class="pure-menu-link" href="/documentation.html">\n            Documentation\n          </a>, <a class="pure-menu-link" href="https://forum.nim-lang.org">\n            Forum\n          </a>, <a class="pure-menu-link" href="/donate.html">\n            Donate\n          </a>, <a class="pure-menu-link" href="https://github.com/nim-lang/Nim">Source</a>, <a class="pure-button pure-button-primary" href="/install.html">Install Nim 1.4.2</a>, <a class="pure-button" href="https://play.nim-lang.org/#ix=2lK1">Try it online</a>, <a href="http://rosettacode.org/wiki/Category:Nim">More examples at RosettaCode…</a>, <a class="post-link" href="/blog/2021/01/20/community-survey-results-2020.html">Nim Community Survey 2020 Results</a>, <a class="post-link" href="/blog/2020/12/28/nim-in-2020-a-short-recap.html">Nim in 2020: A short recap</a>, <a class="pure-button" href="/blog.html">All articles</a>, <a class="pure-button pure-button-primary" href="https://book.picheta.me/">Learn more</a>, <a class="pure-button pure-button-primary" href="/donate.html">Donate</a>, <a href="irc://freenode.net/nim"><i class="fc fa-irc">#</i>FreeNode#nim</a>, <a href="https://gitter.im/nim-lang/Nim"><i class="fab fa-gitter" />Gitter/Nim</a>, <a href="https://discord.gg/nim"><i class="fab fa-discord" />Discord/Nim</a>, <a href="https://matrix.to/#/#freenode_#nim:matrix.org"><i class="fab i">m</i>#nim:matrix.org</a>, <a href="https://irclogs.nim-lang.org"><i class="fc fa-irc">#</i>IRC Logs</a>, <a href="https://t.me/nim_lang"><i class="fab fa-telegram" />Telegram/nim_lang</a>, <a href="https://forum.nim-lang.org">\n          <i class="fa fa-comments" aria-hidden="true" />forum.nim-lang.org</a>, <a href="https://reddit.com/r/nim">\n          <i class="fab fa-reddit" aria-hidden="true" />r/nim</a>, <a href="https://stackoverflow.com/questions/tagged/nim-lang">\n          <i class="fab fa-stack-overflow" aria-hidden="true" />StackOverflow</a>, <a href="https://github.com/nim-lang/Nim/issues">\n          <i class="fab fa-github" aria-hidden="true" />nim-lang/Nim</a>, <a href="https://twitter.com/nim_lang">\n          <i class="fab fa-twitter" aria-hidden="true" />@nim_lang\n        </a>, <a class="pure-button" href="/community.html">Join the community</a>, <a class="pure-button" href="https://github.com/nim-lang/Nim">Source code</a>, <a href="https://creativecommons.org/licenses/by/3.0/">Creative Commons Attribution 3.0</a>, <a href="https://github.com/nim-lang/website">GitHub</a>, <a href="https://github.com/dom96">Dominik Picheta</a>, <a href="https://github.com/Calinou">Hugo Locurcio</a>, <a href="https://github.com/josephwecker">Joseph Wecker</a>, <a href="https://m.do.co/c/637ab907c7f4">\n  <img src="/assets/img/do.png" />\n</a>]']

$

Added an example exclusively for this:

AlexRMU commented 3 years ago

I'll probably continue here.


get2str () terminates execution

import faster_than_requests as requests
print(1)
requests.get2str("http://example.com")
print(2)
1

Same thing with get2str2

print(1)
requests.get2str2(["http://example.com/foo", "http://example.com/bar"], threads = True)
print(2)

And with get2json

print(1)
requests.get2json("http://example.com")
print(2)

And if you run python in the console it just closes


And get2json() takes exactly 1 argument (2 given)

requests.get2json("http://example.com", pretty_print=True)

Maybe the problem is with me?

thisago commented 3 years ago

Hello @AlexRMU! Try adding:

requests.init_client()
juancarlospaco commented 3 years ago

https://github.com/juancarlospaco/faster-than-requests#init_client