lwthiker / curl-impersonate

curl-impersonate: A special build of curl that can impersonate Chrome & Firefox
MIT License
3.47k stars 230 forks source link

Pre-compiled aarch64 binaries for Linux #80

Open NoaHimesaka1873 opened 2 years ago

NoaHimesaka1873 commented 2 years ago

Currently all builds are only for AMD64 platforms. It would be nice to have aarch64 builds.

lwthiker commented 2 years ago

For which platform? If it's for Mac M1 then it's supported and the build should work as described in INSTALL.md. There are no pre-compiled binaries yet since GitHub doesn't provide an M1 machine in GitHub Actions.

NoaHimesaka1873 commented 2 years ago

For which platform?

If it's for Mac M1 then it's supported and the build should work as described in INSTALL.md. There are no pre-compiled binaries yet since GitHub doesn't provide an M1 machine in GitHub Actions.

For Linux. Manually compiling worked but it took some time.

A-Posthuman commented 2 years ago

2nd the request for prebuilt linux arm64, for example for use on AWS graviton instances. I got it to compile, but yeah took a while.

BTW, successfully using the libcurl-impersonate .so files with node-libcurl's build-from-source option for Node (using the LD_PRELOAD trick), which is sweet.

lwthiker commented 2 years ago

For Linux. Manually compiling worked but it took some time.

Alright, I'll edit the title to make it clearer.

BTW, successfully using the libcurl-impersonate .so files with node-libcurl's build-from-source option for Node (using the LD_PRELOAD trick), which is sweet.

@A-Posthuman I think this information would be of use to other devs as well. If you want to write a manual detailing the steps I could then add it to the repo.

A-Posthuman commented 2 years ago

The steps I followed to install node-libcurl, using npm, on ubuntu 20.04:

Check the node-libcurl build instructions and its source code for more details on how to adjust this for other platforms.

jlarmstrongiv commented 2 years ago

Ahh, I’m switching to AWS graviton instances soon.

In github actions, you can build for different architectures using:

      uses: docker/setup-qemu-action@v2
      with:
        platforms: arm64

before your docker/setup-buildx-action@v2

Docs for the multi-arch docker build command https://docs.docker.com/desktop/multi-arch/

lwthiker commented 2 years ago

@jlarmstrongiv Do you mean you want the curl-impersonate Docker image on DockerHub to support arm64?

Regarding the pre-compiled binaries, I'm going to try and cross-compile from Ubuntu, hopefully it will be a simple addition to the CI scripts. Updates soon.

jlarmstrongiv commented 2 years ago

@lwthiker

Regarding the pre-compiled binaries, I'm going to try and cross-compile from Ubuntu, hopefully it will be a simple addition to the CI scripts. Updates soon.

Awesome 🚀 looking forward to it

Do you mean you want the curl-impersonate Docker image on DockerHub to support arm64

Oh yes, the docker/setup-qemu-action@v2 lets you build docker images for multiple architectures

lwthiker commented 1 year ago

Pre-compiled arm64/aarch64 binaries are now available here: https://github.com/lwthiker/curl-impersonate/releases/tag/v0.5.1

and will be built automatically for each new release in the future.

Docker images are still not built for arm64 though, so I'm going to leave this issue open.

zuzupapa commented 1 year ago

The steps I followed to install node-libcurl, using npm, on ubuntu 20.04:

* first get curl-impersonate compiled/installed

* install node-libcurl building dependencies: sudo apt-get install python libcurl4-openssl-dev build-essential

* make a libcurl.so symbolic link to use during compilation (I linked to the chrome.so, haven't tested this with the ff one): sudo ln -s /usr/local/lib/libcurl-impersonate-chrome.so.4.7.0 /usr/local/lib/libcurl.so

* export LD_PRELOAD=/usr/local/lib/libcurl.so

* export CURL_IMPERSONATE=chrome101

* node-libcurl's build instructions explain you can override the linker's flags during the build using "--curl-libraries", so the command to build and install it that worked for me: npm install node-libcurl --build-from-source --curl_libraries='-Wl,-rpath /usr/local/lib -lcurl'

Check the node-libcurl build instructions and its source code for more details on how to adjust this for other platforms.

have someone tried it? I installed all, it works in terminal, but i cant access same website in node libcurl

A-Posthuman commented 1 year ago

node-libcurl with curl-impersonate's libcurl works for me, yes.

ActiniumTO commented 1 year ago

node-libcurl with curl-impersonate's libcurl works for me, yes.

You got discord or telegram brother? i need your help willing to pay you.

A-Posthuman commented 1 year ago

I'm too busy with my own projects atm to take on something else. Try asking around on the Scraping Enthusiasts discord, there are other folks there who might help or take it on.

ActiniumTO commented 1 year ago

The steps I followed to install node-libcurl, using npm, on ubuntu 20.04:

  • first get curl-impersonate compiled/installed
  • install node-libcurl building dependencies: sudo apt-get install python libcurl4-openssl-dev build-essential
  • make a libcurl.so symbolic link to use during compilation (I linked to the chrome.so, haven't tested this with the ff one): sudo ln -s /usr/local/lib/libcurl-impersonate-chrome.so.4.7.0 /usr/local/lib/libcurl.so
  • export LD_PRELOAD=/usr/local/lib/libcurl.so
  • export CURL_IMPERSONATE=chrome101
  • node-libcurl's build instructions explain you can override the linker's flags during the build using "--curl-libraries", so the command to build and install it that worked for me: npm install node-libcurl --build-from-source --curl_libraries='-Wl,-rpath /usr/local/lib -lcurl'

Check the node-libcurl build instructions and its source code for more details on how to adjust this for other platforms.

Just after you done all of this, can you send any example of JS to run actually the code?

A-Posthuman commented 1 year ago

I set some env vars, then just use it similarly to how you can normally use node-libcurl:

process.env.LD_PRELOAD = '/usr/local/lib/libcurl.so';
process.env.CURL_IMPERSONATE = 'chrome107';
process.env.CURL_IMPERSONATE_HEADERS = "no"; // use our own headers, or comment this line out to use curl-impersonate's default headers

// this is for allowing use of the old "require"-style module imports
import { createRequire } from 'module';
const require = createRequire(import.meta.url);
const { curly } = require('node-libcurl');
import fs from "fs/promises";

// I have some basic session management where I create more than one node-libcurl curly object, but to simplify:
let cookieJarFile = `/home/ubuntu/some/path/to/file.txt`;
let fd = await fs.open(cookieJarFile, 'w'); // wipe the cookie jar file from previous run
await fd.close();
let sessionCurl = curly.create({ cookieFile: cookieJarFile, cookieJar: cookieJarFile, cookieList: 'ALL', followLocation: true })
// I have my own array of header strings I use instead of the curl-impersonate default
response = await sessionCurl.get('https://www.somewebsite.com/', { cookieList: 'ALL', HTTPHEADER: headers }); // on very first request, make sure to start with empty cookies
if (response.statusCode === 200) {
   let content = response.data;
   // process data as desired
}
ActiniumTO commented 1 year ago

I set some env vars, then just use it similarly to how you can normally use node-libcurl:

process.env.LD_PRELOAD = '/usr/local/lib/libcurl.so';
process.env.CURL_IMPERSONATE = 'chrome107';
process.env.CURL_IMPERSONATE_HEADERS = "no"; // use our own headers, or comment this line out to use curl-impersonate's default headers

// this is for allowing use of the old "require"-style module imports
import { createRequire } from 'module';
const require = createRequire(import.meta.url);
const { curly } = require('node-libcurl');
import fs from "fs/promises";

// I have some basic session management where I create more than one node-libcurl curly object, but to simplify:
let cookieJarFile = `/home/ubuntu/some/path/to/file.txt`;
let fd = await fs.open(cookieJarFile, 'w'); // wipe the cookie jar file from previous run
await fd.close();
let sessionCurl = curly.create({ cookieFile: cookieJarFile, cookieJar: cookieJarFile, cookieList: 'ALL', followLocation: true })
// I have my own array of header strings I use instead of the curl-impersonate default
response = await sessionCurl.get('https://www.somewebsite.com/', { cookieList: 'ALL', HTTPHEADER: headers }); // on very first request, make sure to start with empty cookies
if (response.statusCode === 200) {
   let content = response.data;
   // process data as desired
}

sadly i get Segmentation fault (core dumped) :( i dont know what to do for real.

i have been trying for hours to get it working please help me

A-Posthuman commented 1 year ago

One thing I thought of is in my example I am using the most recent curl-impersonate source which supports chrome107, but if you happen to be using the precompiled binaries (version 0.5.3 was the last release of those), those only would support up to chrome104. Trying to pass chrome107 to that older version might cause a core dump. If you aren't sure what version you have, see if you have the curl_chrome107 script installed: whereis curl_chrome107

Does running the curl_chrome104 script work to fetch a URL? Or does that also segfault?

Also I don't know if it matters, but in addition to setting the env vars in my program, I also set them in the shell beforehand:

export LD_PRELOAD=/usr/local/lib/libcurl.so export CURL_IMPERSONATE=chrome107

Node version may also matter? Not sure, but I've read somewhere that for best compatibility and least core dumps, you want node/node-libcurl/curl-impersonate all to be using the same or similar OpenSSL version?

I'm using Node 18.12.1 on ubuntu 20.04 if that helps. Running the command "openssl version" reports: OpenSSL 1.1.1f 31 Mar 2020

And of course be sure you have that symlink setup properly where /usr/local/lib/libcurl.so points to the latest curl-impersonate chrome library. On my system that points to: /usr/local/lib/libcurl-impersonate-chrome.so.4.8.0

If any of those ideas solves your issue, please report back.

ActiniumTO commented 1 year ago

Nope this is not really using curl impersonate, and after a deep research i found out its not possible to bind curl impersonate with NodeJS.

jdrajodiya commented 1 year ago

And of course be sure you have that symlink setup properly where /usr/local/lib/libcurl.so points to the latest curl-impersonate chrome library. On my system that points to: /usr/local/lib/libcurl-impersonate-chrome.so.4.8.0

@A-Posthuman can you please elaborate more? I mean where can I actually find this libcurl-impersonate-chrome.so.4.8.0 particular file in linux.

I've followed your above steps to install node-libcurl in linux. But I'm facing issue during swapping of curl bins. Maybe due to symlink is not been configured properly.

ibrah3m commented 8 months ago

Hi I'm interested in this, did anyone got it works? or not possible?