Closed linaspasv closed 4 years ago
use responseType
as buffer
Docs: https://github.com/sindresorhus/got#responsetype
Example:
const { body} = await got(url, { responseType: 'buffer' });
You can also pipe stream to file.
const { createWriteStream } = require("fs");
const got = require("hooman");
(async () => {
const image = createWriteStream("image.jpg");
got
.stream("https://c.pxhere.com/images/11/49/74e4a31de6abe70227fa1cb22d37-1612083.jpg!d")
.pipe(image);
})();
Hm, still getting 415 (Unsupported Media Type) :/
const got = require("hooman");
let url = "https://c.pxhere.com/images/11/49/74e4a31de6abe70227fa1cb22d37-1612083.jpg!d"
got(url, { responseType: 'buffer' }).then(response => {
console.log(response.body)
})
@linaspasv can you please try stream example I posted later? Also, this library is just a wrapper around got to bypass Cloudflare js-challenge, request related issues are best to first test with got library and open issue over there.
With the stream example I get 503 (Service Unavailable) error.
The image I am trying to access is under cloudflare anti-ddos protection page. I have tried your script with the regular HTML page and it works perfectly but fails for a direct image download. I am not sure if it's issue with this library or got library.
I have locally tested it and it works fine for me. In fact, I have gone ahead and added this on the test and it seems to be passing as well. https://github.com/sayem314/hooman/commit/f9aa0e5b048bdff50cb6a0b8de31ebc18ee83110
Edit: Travis > https://travis-ci.org/github/sayem314/hooman/jobs/685101120
Do you get challenged by cloudflare? When I run the same code (see below) on an image that has no cloudflare protection it works perfectly. pxhere.com does not give a challenge page for residential IP addresses but when you run this on some servers you get challenged. :-)
const got = require('hooman');
const fs = require('fs');
(async () => {
//let resource = 'https://c.pxhere.com/images/11/49/74e4a31de6abe70227fa1cb22d37-1612083.jpg!d';
let resource = 'https://explorecams.com/storage/photos/GdmaI9FIbe_1600.jpg';
got.stream(resource)
.on('error', err => console.log(err))
.pipe(
fs.createWriteStream('image.jpg')
)
})();
I have tested on residential ip where their main domain did not throw js-challenge and so I have gone ahead and tested on my own Cloudflare challenge activated domain where js-challenges are always thrown.
Test code:
const test = require("tape");
const scrape = require("hooman");
const { writeFileSync, statSync } = require("fs");
const jsChallengePage = "https://cf-js-challenge.sayem.eu.org";
// Test image download
test("sample image download", async t => {
console.time("image download");
const { body } = await scrape(jsChallengePage + "/images/background.jpg", {
responseType: "buffer"
});
console.timeEnd("image download");
// Write to file
t.ok(Buffer.isBuffer(body));
writeFileSync("image.jpg", body);
// Check image size
const { size } = statSync("image.jpg");
t.equal(size, 31001);
});
Note that I have removed other tests for fair testing result.
Test result with console log for easy debugging:
Here is updated test code and results: https://github.com/sayem314/hooman/commit/9776a070fc12b2c69ce6169ceaf14671363c8dde
A possible fix for you. I'm not sure what's causing you issue but give this a try:
const got = require('hooman');
const fs = require('fs');
(async () => {
await got('https://explorecams.com') // init cookie
let resource = 'https://explorecams.com/storage/photos/GdmaI9FIbe_1600.jpg';
got.stream(resource)
.on('error', err => console.log(err))
.pipe(
fs.createWriteStream('image.jpg')
)
})();
No luck. Also, I have tried to run the same without hooman (see the source code below) and I end up with the same Response code 503 (Service Temporary Unavailable) error.
I have also tried to just curl and I get the challenge page code... so it seems your plugin is not triggered to solve the challenge page when I run this particular URL.
const got = require('got');
const fs = require('fs');
(async () => {
let resource = 'https://c.pxhere.com/images/11/49/74e4a31de6abe70227fa1cb22d37-1612083.jpg!d';
got.stream(resource)
.on('error', err => console.log(err))
.pipe(
fs.createWriteStream('image.jpg')
)
})();
Can you send me the HTML of the challenge page?
Okay, so it seems my challenge page ends up in .on('error') and your plugin does not pick it somehow. The challenge page for IMAGE is the same as for a regular HTML page and it works perfectly with your library!
const got = require('hooman');
const fs = require('fs');
(async () => {
let resource = 'https://c.pxhere.com/images/11/49/74e4a31de6abe70227fa1cb22d37-1612083.jpg!d';
got.stream(resource)
.on('error', err => console.log(err.response.body))
.pipe(
fs.createWriteStream('image.jpg')
)
})();
I get the follow output now. cf-challenge.txt
Also, attaching received headers for that page.
It seems this might be the issue why your hook at afterResponse is not being triggered and I am seeing the following results.
I see .streams()
are unsupported unfortunately. But did you try with responseType: 'buffer'
as shown in test.js of hooman? Btw your HTML is okay, hooman should be able to solve it without issue.
https://github.com/sayem314/hooman/blob/master/test.js#L41-L48
First of all to make your library work with 'buffer' one needs to convert buffer to the string inside the afterResponse hook first.
if (
// If site is not hosted on cloudflare skip
response.statusCode === 503 &&
response.headers.server === "cloudflare" &&
response.body.includes("jschl-answer")
) {
let body = response.body instanceof Buffer
? response.body.toString()
: response.body
const data = await solve(response.url, body);
While this part is resolved I still get 415 (Unsupported Media Type) error when this line runs - https://github.com/sayem314/hooman/blob/9776a070fc12b2c69ce6169ceaf14671363c8dde/index.js#L42
Convert is not necessary on hooks since it should match only when it's an HTML page. Something must be wrong on your end, I have tested it on multiple datacenter IP and VPN, and for me, it works every time. Something must be wrong on your end :(
As you can see Travis CI tests are passing as well which are done from shared datacenter IP and my domain throws Cloudflare challenge regardless of how clean your IP is with a custom filter.
// Fixed
@andress134 can you be more specific what you are trying to achieve? Btw I guess your question is not related to this issue, for further discussion please open new issue with more details. Also your code was unreadable so I had to edit it a little.
Here is how you use proxy btw as per your code example.
const fs = require('fs'),
got = require('hooman'),
path = require('path'),
HttpsProxyAgent = require('https-proxy-agent');
const target = process.argv[2],
time = process.argv[3],
req_per_ip = process.argv[4];
let proxies = fs
.readFileSync(process.argv[5], 'utf-8')
.replace(/\r/gi, '')
.split('\n')
.filter(Boolean);
function send_req() {
let proxy = proxies[Math.floor(Math.random() * proxies.length)];
proxy = new HttpsProxyAgent('http://' + proxy);
return new Promise((resolve, reject) => {
got(target, {
agent: {
https: proxy,
},
cloudflareRetry: 10,
})
.then((response) => {
console.log(response.body);
resolve(response);
})
.catch((error) => {
let obj_v = proxies.indexOf(proxy);
proxies.splice(obj_v, 1);
console.log(error.message);
return reject(error.message);
});
});
}
Proxy docs: https://github.com/sindresorhus/got#proxies Proxy module: https://www.npmjs.com/package/https-proxy-agent
// fixed
@andress134 the mentioned sites works fine on tests.
And please don't continue any further discussion about this in this issue, create a new issue and I'm happy to assist you.
Site is returning Cloudflare challenge on me on the browser and I have verified that hooman successfully bypassed it.
Closing this issue as I was unable to re-produce. BTW I was also able to get .stream()
to work, I will update the instruction on the readme.
@sayem314 thank you for your help. Looking forward to try this with .stream()
. :-)
@linaspasv docs updated for stream https://github.com/sayem314/hooman#pipe-stream
I am trying the following and get 415 (Unsupported Media Type) error.