Closed AlexRMU closed 3 years ago
Please provide the full source code, is not possible to troubleshoot without reproducing the bug.
All code provided earlier
scraper2()
requests.scraper2(list_of_urls=["https://nim-lang.org", "http://example.com"], list_of_tags=["h1", "h2"], case_insensitive=False)
scraper()
res = requests.scraper(list_of_urls=["http://example.com"], html_tag="div", threads=True)
res = requests.scraper(list_of_urls=["http://example.com"], html_tag="div", threads=False)
res = requests.scraper(list_of_urls=["http://example.com"], threads=True)
res = requests.scraper(list_of_urls=["http://example.com"], threads=False)
This is not my code, but the code from the examples. And it doesn't work. And some examples.py doesn't work either. Yeah.
And the question about cookies. There is a website http://httpbin.org/cookies
, it outputs the cookies sent to it. Everything works with requests, but nothing works with get(), head(), set_headers(). Please give a working example.
What's up?
Scraper2 is now fixed.
About the first point, you have to use the argument with a name if Python gets confused, is a feature that Python do not have, let me show you a tiny example to explain it better:
More fixes coming soon...
Documentation added, it is working:
$ cat prueba.py
import faster_than_requests
faster_than_requests.init_client()
print(
faster_than_requests.scraper(
list_of_urls = ["http://nim-lang.org"],
html_tag = "a",
case_insensitive = False,
deduplicate_urls = True,
)
)
faster_than_requests.close_client()
$ python prueba.py
['@[<a class="pure-menu-heading pure-menu-link site-logo-container" href="/">\n <img src="/assets/img/logo.svg" class="site-logo" alt="Nim" height="28" />\n </a>, <a class="pure-menu-link" href="/blog.html">\n Blog\n </a>, <a class="pure-menu-link" href="/features.html">\n Features\n </a>, <a class="pure-menu-link" href="/install.html">\n Download\n </a>, <a class="pure-menu-link" href="/learn.html">\n Learn\n </a>, <a class="pure-menu-link" href="/documentation.html">\n Documentation\n </a>, <a class="pure-menu-link" href="https://forum.nim-lang.org">\n Forum\n </a>, <a class="pure-menu-link" href="/donate.html">\n Donate\n </a>, <a class="pure-menu-link" href="https://github.com/nim-lang/Nim">Source</a>, <a class="pure-button pure-button-primary" href="/install.html">Install Nim 1.4.2</a>, <a class="pure-button" href="https://play.nim-lang.org/#ix=2lK1">Try it online</a>, <a href="http://rosettacode.org/wiki/Category:Nim">More examples at RosettaCode…</a>, <a class="post-link" href="/blog/2021/01/20/community-survey-results-2020.html">Nim Community Survey 2020 Results</a>, <a class="post-link" href="/blog/2020/12/28/nim-in-2020-a-short-recap.html">Nim in 2020: A short recap</a>, <a class="pure-button" href="/blog.html">All articles</a>, <a class="pure-button pure-button-primary" href="https://book.picheta.me/">Learn more</a>, <a class="pure-button pure-button-primary" href="/donate.html">Donate</a>, <a href="irc://freenode.net/nim"><i class="fc fa-irc">#</i>FreeNode#nim</a>, <a href="https://gitter.im/nim-lang/Nim"><i class="fab fa-gitter" />Gitter/Nim</a>, <a href="https://discord.gg/nim"><i class="fab fa-discord" />Discord/Nim</a>, <a href="https://matrix.to/#/#freenode_#nim:matrix.org"><i class="fab i">m</i>#nim:matrix.org</a>, <a href="https://irclogs.nim-lang.org"><i class="fc fa-irc">#</i>IRC Logs</a>, <a href="https://t.me/nim_lang"><i class="fab fa-telegram" />Telegram/nim_lang</a>, <a href="https://forum.nim-lang.org">\n <i class="fa fa-comments" aria-hidden="true" />forum.nim-lang.org</a>, <a href="https://reddit.com/r/nim">\n <i class="fab fa-reddit" aria-hidden="true" />r/nim</a>, <a href="https://stackoverflow.com/questions/tagged/nim-lang">\n <i class="fab fa-stack-overflow" aria-hidden="true" />StackOverflow</a>, <a href="https://github.com/nim-lang/Nim/issues">\n <i class="fab fa-github" aria-hidden="true" />nim-lang/Nim</a>, <a href="https://twitter.com/nim_lang">\n <i class="fab fa-twitter" aria-hidden="true" />@nim_lang\n </a>, <a class="pure-button" href="/community.html">Join the community</a>, <a class="pure-button" href="https://github.com/nim-lang/Nim">Source code</a>, <a href="https://creativecommons.org/licenses/by/3.0/">Creative Commons Attribution 3.0</a>, <a href="https://github.com/nim-lang/website">GitHub</a>, <a href="https://github.com/dom96">Dominik Picheta</a>, <a href="https://github.com/Calinou">Hugo Locurcio</a>, <a href="https://github.com/josephwecker">Joseph Wecker</a>, <a href="https://m.do.co/c/637ab907c7f4">\n <img src="/assets/img/do.png" />\n</a>]']
$
Added an example exclusively for this:
I'll probably continue here.
get2str () terminates execution
import faster_than_requests as requests
print(1)
requests.get2str("http://example.com")
print(2)
1
Same thing with get2str2
print(1)
requests.get2str2(["http://example.com/foo", "http://example.com/bar"], threads = True)
print(2)
And with get2json
print(1)
requests.get2json("http://example.com")
print(2)
And if you run python in the console it just closes
And
get2json() takes exactly 1 argument (2 given)
requests.get2json("http://example.com", pretty_print=True)
Maybe the problem is with me?
Hello @AlexRMU! Try adding:
requests.init_client()
There are problems with the documentation
This does not work, you need to replace it with
And about the scraper 2(). What do you mean "It returns the Scraped Webs"? He gave it back to me
[[<capsule object NULL at 0x00000214F58DCEA0>, <capsule object NULL at 0x00000214F58DCED0>, <capsule object NULL at 0x00000214F58DCF00>, <capsule object NULL at 0x00000214F58DCA80>, <capsule object NULL at 0x00000214F58DCAB0>, <capsule object NULL at 0x00000214F58DCA50>, <capsule object NULL at 0x00000214F58DC2A0>...
And what to do about it?And about the scraper().
Returns nothing. Generally. Why?