samc621 / SneakerBot

All-in-one bot, with auto captcha-solving and proxy management, using Node.js and Puppeteer.
MIT License
741 stars 194 forks source link

"Not in cart, trying again" #28

Open Diablozone opened 3 years ago

Diablozone commented 3 years ago

I am running this on windows without docker, and I got the bot to work. it opens up the browser, selects the size and then gets stuck in an endless loop saying "not in cart, trying again". I am trying to test run on the nike website and in the browser it gives me an error saying

"We had an issue with your request. If you continue experiencing issues, try refreshing the page.

[ Code: 9F502A89 ]

Thanks in advance...

samc621 commented 3 years ago

Hi @Diablozone, thanks for opening this issue. I have seen this error before, however, I've seen the product also add to cart (and it continue to checkout) in the presence of this error. Are you seeing the same?

Diablozone commented 3 years ago

no the product doesnt add to cart. should I keep trying again and again?

Diablozone commented 3 years ago

what happens is that the page keeps refreshing and this error keeps looping. eventually the chromium tab closes automatically

Diablozone commented 3 years ago

can you give some insight into why this error occurs or which part of the code its coming from so that I may be able to look into it further

samc621 commented 3 years ago

@Diablozone seems I am seeing the same behaviour, now. There is a 429 error coming from the API when clicking any of the DOM elements for style, size, or ATC button. I am going to need to look into this further and will circle back on it ASAP.

Diablozone commented 3 years ago

Sure, Thanks a lot!

On Wed, Jun 9, 2021 at 7:04 PM Samuel Corso @.***> wrote:

@Diablozone https://github.com/Diablozone seems I am seeing the same behaviour, now. There is a 429 error coming from the API when clicking any of the DOM elements for style, size, or ATC button. I am going to need to look into this further and will circle back on it ASAP.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/samc621/SneakerBot/issues/28#issuecomment-857699359, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANBWNBVWG6CWYRIPEVNHS2DTR5UYBANCNFSM46I3ABPQ .

samc621 commented 3 years ago

@Diablozone just updating this here, the error on the nike integration appears to be a CORS issue. You will see some of the requests failing in the Chrome devtools network tab. I've started looking into it but haven't gotten to the bottom of it, yet. It just started happening afaik.

Diablozone commented 3 years ago

Does this mean that of I use a firefox webdriver the issue might be solved?

On Thu, Jun 10, 2021, 21:07 Samuel Corso @.***> wrote:

@Diablozone https://github.com/Diablozone just updating this here, the error on the nike integration appears to be a CORS issue. You will see some of the requests failing in the Chrome devtools network tab. I've started looking into it but haven't gotten to the bottom of it, yet. It just started happening afaik.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/samc621/SneakerBot/issues/28#issuecomment-858727762, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANBWNBTECMEGDFGPLONUDNTTSDL45ANCNFSM46I3ABPQ .

samc621 commented 3 years ago

@Diablozone its definitely worth a try.

Diablozone commented 3 years ago

Doesnt work. Also, I tried a couple of Shopify sites but they just don't open. The new chromium tab opens and closes automatically

On Tue, Jun 15, 2021, 17:20 Samuel Corso @.***> wrote:

@Diablozone https://github.com/Diablozone its definitely worth a try.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/samc621/SneakerBot/issues/28#issuecomment-861431395, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANBWNBWC7KIW4VRRFVJRYGTTS446RANCNFSM46I3ABPQ .

samc621 commented 3 years ago

@Diablozone can you provide a sample Task object for me to test with? I was recently testing on Kith and some others and it was working fine.

labboy0276 commented 3 years ago

@samc621 I am seeing this with your API example:

{
    "site_id": 1,
    "url": "https://www.nike.com/t/lebron-18-black-white-basketball-shoe-M6DgN2/CQ9283-100",
    "style_index": 1,
    "size": "8.5",
    "shipping_speed_index": 0,
    "billing_address_id": 1,
    "shipping_address_id": 1,
    "notification_email_address": "myemail"
}

Side question, what is style_index and is it just the style like Best 10-18 or Best 1-9 in the example url above?

samc621 commented 3 years ago

Hi @labboy0276, yes I noted this issue a couple of weeks ago. I believe that it is a CORS error, and I haven't gotten around to addressing it yet.

samc621 commented 2 years ago

@labboy0276 just copying this over here for reference: https://user-images.githubusercontent.com/38767335/125101214-fa64c380-e0a7-11eb-92dd-cbdb50f2e127.png

labboy0276 commented 2 years ago

I am going to throw this here, but haven't tried it yet.

Have you thought about using a header like this @samc621

        headers: {
            'user-agent': userAgent,
            'sec-fetch-dest': 'none',
            'accept': '*/*',
            'sec-fetch-site': 'cross-site',
            'sec-fetch-mode': 'cors',
            'accept-language': 'en-US'
        }.

I have seen this in other bots for scrapping the interwebz.

samc621 commented 2 years ago

@labboy0276 have you tested this with the Nike issue? Curious if it might help.

labboy0276 commented 2 years ago

negative @samc621 just putting it there as I havent had time to test it out yet. If someone else can, that would be helpful.

samc621 commented 2 years ago

@labboy0276 I added this:

await page.setExtraHTTPHeaders({
          'user-agent': `${userAgent}`,
          'sec-fetch-dest': 'none',
          accept: '*/*',
          'sec-fetch-site': 'cross-site',
          'sec-fetch-mode': 'cors',
          'accept-language': 'en-US'
});

But it doesn't seem to be doing the trick. I took a look in the console and it looks like a 429 error (blank response) on this API. Screen Shot 2021-07-18 at 9 20 26 AM Screen Shot 2021-07-18 at 9 20 43 AM

samc621 commented 2 years ago

There is a new captcha on the footsites, I've never repro'd it in the browser but I'm curious if this might help reduce bot detection so that we don't hit it there. I'll give it a go in a bit.

samc621 commented 2 years ago

So it looks like, testing with these headers on the footsites, I got blocked. It took me to a Terms of Service page. Testing without them, I hit the captcha I was expecting.

All in all, I don't think setting these headers is helping much. I will need another approach to both the Nike issue and the footsites captcha.

Kohlsen commented 2 years ago

Any update on this? @samc621

Kohlsen commented 2 years ago

I did some research on the 429 status code error and from what I got out of it, a 429 is a rate limiter error. Meaning the user sent too many requests at once. Could this be fixed by adding some timeouts to puppeteer?

samc621 commented 2 years ago

@Kohlsen yes, that is indeed the common meaning for a 429 response code. I have tried that, among numerous other things, but to no avail. I keep looking at it when I get a chance, and I think others are too. Let me know if you get anything to work!

jhgeluk commented 2 years ago

I'm also experiencing this issue on nike.com. When replicating the "clickthrough/search" behaviour on my own browser it doesn't occur

samc621 commented 2 years ago

@jhgeluk yes, I think it has something to do with the browser fingerprint and/or the speed of the navigations. I have some potential solutions for this, including:

  1. Randomizing all of the window.navigator properties.
  2. Using puppeteer to emit mousemovement events when clicking a DOM node.
  3. Using more delay in between certain actions.

I just haven't had time to implement this, but if someone is willing to give it a try, please feel free!

samc621 commented 2 years ago

@13ROY could you take a look at his when you get a chance?

jhgeluk commented 2 years ago

I feel like they mostly check the browser fingerprint, I've simulated mouse movements & keyboard movements and even added a randomized amount of links it will visit before entering nike to create a sort of "browser history".

samc621 commented 2 years ago

@jhgeluk yes, the browsing history was a good idea for creating a cookie profile. Have you checked window.navigator in the console? These are many of the properties used by fingerprinting libraries, and puppeteer-extra does a good job of randomizing some of them, but we can do more on top of that. We can also look through Nike's FE code for the fingerprinting code which might be stopping us. I am pretty confident that it is indeed a fingerprinting issue because this happens even when I don't use any proxies (just a regular residential IP) and as you said, the mousemovements aren't helping.

jhgeluk commented 2 years ago

Looks like puppeteer is also missing an event listener for Network.responseReceivedExtraInfo, this might trigger some libraries like FingerprintJS

samc621 commented 2 years ago

@jhgeluk yes that makes a lot of sense, can you try to identify what libraries Nike might be using? That way we can reverse engineer to a solution. This is commonly how I solve issues like this.

abhingupta commented 2 years ago

Hi, I've been trying to resolve this but to no avail. I've tried inputting custom headers and disabling web security for now. Sam, any thoughts on how we can fix this?

samc621 commented 2 years ago

@abhingupta there's a lot of things we can try here:

We are already randomizing the user agent but I would try Playwright with another browser like Firefox. I would try adding delay between the interactions. I would try to make sure that the browser emits mouse movement and click down events. I would check the network requests tab to see if there any requests which we are failing (other than the 429 error). I would evaluate their JS code to look for any kind of fingerprinting library (if so, we must figure out how to emulate or reverse engineer it).

These are starting points. There's more we can do from here.

bklynate commented 2 years ago
Screen Shot 2022-04-16 at 12 15 31 PM

Is this error related to this bug as well?

Here is the task I am using to get this bug...

{
    "site_id": 1,
    "url": "https://www.nike.com/t/air-max-97-se-mens-shoes-3l919x/DN1893-001",
    "style_index": 1,
    "size": "9.5",
    "shipping_speed_index": 0,
    "billing_address_id": 2,
    "shipping_address_id": 1,
    "notification_email_address": "nathaniel@econify.com"
}
samc621 commented 2 years ago

@bklynate yes it's the same issue. If you look in the Network tab, you should see some 429 errors.

bklynate commented 2 years ago

RANDOM OBSERVATION I've noticed even with PARALLEL_TASKS=1 Chrome opens two windows when beginning a task, why is that? And could that be the source of the issue?

samc621 commented 2 years ago

@bklynate I'm pretty sure that is because of how puppeteer-cluster works. It always has an extra page/browser (depending on the concurrency context you choose). It's similar to how puppeteer opens a blank page when it starts up.

I think the issue is more likely to be related to the browser fingerprint. There's a lot of things that antibot softwares use to detect bots, but one of the easiest ones I've seen is the use of Chromium or another developer-friendly browser. The fix might be as simple as switching it out with Firefox. See more of my suggestions above.

bklynate commented 2 years ago

I've tried changing browsers (Firefox) and forcing the use of Chrome instead of Chromium, but none of that has worked thus far.

ethanlaj commented 2 years ago

Is this still an active problem?

samc621 commented 2 years ago

Is this still an active problem?

@ethanlaj Yes I believe so, I haven't seen a PR to fix it. If you get around to it, feel free to open one and I'll happily review it.

elManto commented 2 years ago

Question have you experienced this issue with ALL websites or just with the nike website? At least I know where to start my investigation to understand which component performs the actual fingerprinting job

samc621 commented 2 years ago

@elManto only on Nike.

elManto commented 1 year ago

Update, at least if someone is working on this he doesn't have to re-invent the wheel. 1) I use this site https://amiunique.org/fp to compare the fingerprint of my browser with one managed by puppeteer both headless and with GUI. In the GUI case I'd say the only difference is probably a plugin missing, I think that is not enough to block our requests 2) I had a quick look at the nike page, this library looks like interesting: https://github.com/bluesmoon/boomerang . It does user profiling, officially for data monitoring but it may be used also to block the bots. For now I didn't reverse it, I'm not a web guy and it will take some time, I prefer excluding other roads before. 3) What is interesting is that if I start a naive puppeteer request (headless == false) to a product URL on the nike website from a separated script, without any type of customization/user agent randomization etc., it lets me connect without blocking anything, maybe one of the flags that @samc621 enabled when running chrome is a bad one, I'll investigate in the next days

aditya-pushkar commented 1 year ago

@samc621 I managed to get around this problem with puppeteer, but the solution was not scalable, so I decided to try ( Playwright ), and it worked. Here is the some basic code I wrote.

const { webkit } = require('playwright');

(async () =>  {

    const browser = await webkit.launch({headless: false});
    const page = await browser.newPage();

    const url = "https://www.nike.com/in/t/air-force-1-07-lv8-shoes-V6SkWv/DR9866-100"
    await page.goto(url, {
        waitUntil: 'networkidle'
    });

    console.log("Selecting the size !")
    await page.locator('text=UK 9').first().click();

    console.log("Clicking on Add to cart !")
    await page.locator('text=Add to Bag').click();

    await page.waitForTimeout(3000)
    page.goto("https://www.nike.com/in/cart")

})();

However, Playwright has some major issues with Nike.com.

Chromium browser is not working on Nike.com, so i have to use FireFox and Safari for testing.

According to the documentation of "puppeteer-extra", we can use "puppeteer-extra-plugin-stealth" with "playewright-extra". here is the doc. But the moment I used the "playwright-extra" plugin, the program started throwing some errors.

 typeError: Cannot read properties of undefined (reading 'userAgent')
 at Proxy.<anonymous> (/Users/{username}/Desktop/playground/bot/node_modules/playwright-extra/dist/index.cjs.js:270:33)
 at async Plugin.onPageCreated (/Users/{username}/Desktop/playground/bot/node_modules/puppeteer-extra-plugin-stealth/evasions/user-agent-override/index.js:69:8)

But the main point is that Nike.com started throwing 429 errors again after using extra plugins. Here is the code.

const { firefox } = require('playwright-extra');

const stealth = require('puppeteer-extra-plugin-stealth')();
firefox.use(stealth);

const UserAgent = require('user-agents');
const userAgent = new UserAgent();

(async () =>  {
    const browser = await firefox.launch({
        headless: false
    });
    const context = await browser.newContext({
        userAgent: `${userAgent.userAgent}`
    })
    const page = await context.newPage();

    const url = "https://www.nike.com/in/t/air-force-1-07-lv8-shoes-V6SkWv/DR9866-100"
    await page.goto(url, {
        waitUntil: 'networkidle'
    });

    await page.waitForTimeout(5000)

    console.log("Selecting the size !")
    await page.locator('text=UK 9').first().click();

    console.log("Clicking on Add to cart !")
    await page.locator('text=Add to Bag').click();

    await page.waitForTimeout(3000)
    page.goto("https://www.nike.com/in/cart")
})();

Here is my thought.

Nike is somehow able to detect the "Puppeteer" and "puppeteer-extra" plugins.

For Nike.com, "Playwright" without any plugins should be sufficient, but we must modify the User Agent and Fingerprints if we want to scale the bot.

Article about tracking of Puppeteer.

Open for feedback.

samc621 commented 1 year ago

@elManto @aditya-pushkar thank you both for your work here, and sorry for the delay in the response as I've gotten very busy. It seems to me that there are many potential solutions so I think the best step forward is to identify the constraints and then agree on the best solution within those constraints.

  1. I have recently seen more websites that are blocking just on the basis of detecting Chromium. I'm 100% fine with switching this out with another browser. Playwright can do this trivially, but Puppeteer can do this too. You can launch Puppeteer with a custom executablePath to Chrome or even another browser. I also think you can specify a product with one of chrome or firefox.
  2. It has also come to my attention that some of the launch args I added for Docker "headful" support (e.g. --no-sandbox) might be interfering here. I'm fine with removing them as long as the support remains. The goal here was the ability to run the headful mode on a server and then access it via VNC client.
  3. In theory, this bot should work irregardless of whether we are using headless true/false. Running in "headful" mode should be an option, not a requirement.
  4. I'd also like to continue to use puppeteer-cluster if possible. AFAIK, there isn't anything Playwright can do that Puppeteer can't.

We are already randomizing the UA but we can also look into loading a custom profile with the userDataDir argument.

I'd be very surprised if the above isn't enough to get unblocked. I'll need to find time for a closer look if it does.

aditya-pushkar commented 1 year ago

@samc621 Just an update I tried puppeteer Chrome with a custom executablePath and saved the browser session with userDataDir, but the bot is still detectable.

The bot is not able to pass the detection test on Creepjs.

I think we have to modify the fingerprint.

samc621 commented 1 year ago

@samc621 Just an update I tried puppeteer Chrome with a custom executablePath and saved the browser session with userDataDir, but the bot is still detectable.

The bot is not able to pass the detection test on Creepjs.

I think we have to modify the fingerprint.

@aditya-pushkar what version of Chrome did you use? Did you try removing any of the launch args? And did you launch in headless or headful? Some info will help me verify from my end.

aditya-pushkar commented 1 year ago

@samc621

aditya-pushkar commented 1 year ago

@samc621

Can you help me with this? I have a use case in which multiple users can request different tasks at the same time, and for each task, a new puppeteer instance should start running immediately without getting queued.

However, the single-threaded nature of Node is the issue. Whenever we want to run a CPU-intensive task, it processes a single request at a time, and other tasks get queued.

For example, I set up an express server with a puppeteer and when I send more than one request/task at a time, the task gets queued until the previous request is completed. Is there a way around this?

Is there something I'm missing?

Or can this problem be solved by serverless?

If there is any good resource you can point me to, It will be very helpful.

samc621 commented 1 year ago

@aditya-pushkar not sure I understand your issue. I also think this might belong on a separate issue, but I'll try to help anyways.

So is your issue starting multiple tasks from the API in parallel? Is it a Puppeteer issue or a Node issue?

On the Puppeteer side, this shouldn't be a problem as long as you set an appropriate maxConcurrency for Puppeteer Cluster (you can use the PARALLEL_TASKS env var for this). Keep in mind the resource constraints of your machine.

On the Node side, I don't see what Node/Express has to do with this. They're async requests so they will not block the thread thanks to the Node.js event loop. Your Express server should be able to handle 1000 concurrent requests, or more, without issue. So I'm not sure what's causing the blocking from your end.

I might be missing some context. Can you explain your scenario (how you are testing/implementing this) in more detail?

aditya-pushkar commented 1 year ago

@samc621 Thank you very much.