Closed gapmiss closed 2 years ago
Thanks, could you provide me with a URL to test implementation?
examples: https://medium.com/@JasonWyatt/squeezing-performance-from-sqlite-explaining-the-virtual-machine-2550ef6c5db https://www.producthunt.com/posts/easyscrape
^^ could be "medium.com" sites
Also, reddit appears to blocks all requests.
https://www.reddit.com/r/mullvadvpn/comments/swimwp/what_does_the_malware_protection_on_20221beta_1_do/ https://www.reddit.com/r/mullvadvpn/comments/sxewrh/whatismyipaddress_showing_confirmed_proxy_ip/
Example of command I am running:
./reader -a "Safari: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Safari/605.1.15" -i 'https://example.com' > ~/pkm/\@TEST/$KMVAR_Local__Title.md
Thx
Please try again with latest master:
reader -a "Safari: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Safari/605.1.15" https://medium.com/@JasonWyatt/squeezing-performance-from-sqlite-explaining-the-virtual-machine-2550ef6c5db
I am currently using the "reader_0.1.2_darwin_amd64.tar.gz" release. I do not have the capabilities to compile the source code. I will wait till you release the next version and download. Thank you again.
I've used the latest release(v0.1.3) and can report that "medium.com" and "producthunt.com" are now working great.
However, when trying the reddit pages again, they return what looks like encoded binary data.
For example:
./reader -a 'Safari: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Safari/605.1.15' \
-i https://www.reddit.com/r/mullvadvpn/comments/sxewrh/whatismyipaddress_showing_confirmed_proxy_ip/
returns this:
xœÜ½Ù–âH²(ú^\_Á®^uº²WˆÒ,‘yªÖfFÌó´W-¡ ��„Æý\[÷ý~Ù•„ \\€"#«ºOeEH>˜»›™››™›ý”rÿûßÿUhåû“v15·W
<< TRUNCATED >>
iü;�žú®Ùq'ÃŽÓq½^NÐñ3c¶B¬hY\]6h�-gàMF=Ëí°þ7œ#�D¯=²†Ï�½Óñ&�Aß?7ûtíñQÁKJ‰Éx
I will keep testing w/ other websites and report any findings.
Thank you
@gapmiss I believe I have fixed this issue with one of the latest releases. Please give it a try again sometime and report back if it is still happening.
version: 0.2.1
Seems to still be a cookie issue.
This (w/ and w/out the user-agent flag):
./reader -a 'Safari: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Safari/605.1.15' -i https://www.reddit.com/r/mullvadvpn/comments/sxewrh/whatismyipaddress_showing_confirmed_proxy_ip/
returns:
Blocked
reddit's awesome and all, but you may have a bit of a problem.
if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please contact us at this email address mailto:ratelimit@reddit.com?Subject=Blocked%20198.54.130.117.
when contacting us, please include your ip address which is: 198.54.130.117 and reddit account
I am able to view the above reddit URL in the browser without the blocking.
Let me know if I can test further.
For reference:
@gapmiss are you able to reproduce this behavior on any other site but reddit?
are you able to reproduce this behavior on any other site but reddit?
@mrusme ~ No, I have not experienced this behavior w/ any other sites
@mrusme I can reproduce using v0.3.0 with this link:
I don't know if Cookies is enough to access to these sites, some require JavaScript:
I'd recommend using the following command until this was fixed:
wget -O - https://gearmoose.com/best-tactical-gifts-2/ | reader -i -
I have just release v0.4.0 which includes a fix that should work for most sites. I've tested the gearmoose and bloomberg examples from above and they were loading.
I'll close this issue for now, as there is...
In case anyone feels like the majority of sites still won't work, feel free to re-open it!
It appears that any website that uses certain Cloudflare security checks returns:
Is it possible to enable cookies?
Thank You