reacherhq / check-if-email-exists

Check if an email address exists without sending any email, written in Rust. Comes with a ⚙️ HTTP backend.
https://reacher.email
Other
4.31k stars 333 forks source link

HaveIBeenPawned? #289

Closed amaury1093 closed 12 months ago

amaury1093 commented 4 years ago

Add a field misc.have_i_been_pawned: true/false which makes an API call to https://haveibeenpwned.com/

NChechulin commented 3 years ago

There is a small problem: haveibeenpwned's API costs $3.50/month. Maybe consider scraping or a similar free API?

amaury1093 commented 3 years ago

Ah, I wasn't aware it was paid. So maybe not, I don't think it's super high priority (and people can always make a separate API call for that).

I recall the author was open-sourcing it. Will it still be paid after?

NChechulin commented 3 years ago

On API Key Page they provide a link to a blog post, which says:

Clearly not everyone will be happy with this so let me spend a bit of time here explaining the rationale. This fee is first and foremost to stop abuse of the API.

So, I think that we should not expect that API will become free soon.

DigitalGreyHat commented 2 years ago

Hello I made my own API. It's free forever! And it works the same as haveibeenpwned.com. I try to make a PR soon. Edit: I am not a rust dev😅

LeMoussel commented 2 years ago

@DigitalGreyHat Can you give some/more information about your API?

olivermontes commented 2 years ago

hi, any news? @DigitalGreyHat

sylvain-reynaud commented 1 year ago

Hello, I am currently working on this.

There are my thoughts:

The problem with the cloudflare bypass is that we have to rely on a stealth browser. Otherwise cloudflare will be triggered. https://github.com/ultrafunkamsterdam/undetected-chromedriver seems to be the one with the biggest community. I did a PoC and the results are not reliable. It works ~70% of the time (30% of crash/no response). Another problem of the slealth browser is that it brings a lot of new dependencies with its maintainability need.

To my mind, implement the paid API is the way to go. Otherwise we can find another reliable and free API.

amaury1093 commented 1 year ago

Let's go with the paid API. @sylvain-reynaud would you like to create a PR?

I think the way to go is:

amaury1093 commented 1 year ago

Otherwise we can find another reliable and free API.

Do people know of other free APIs? Ideally open-source. We can always add misc.<other_api> = true/false, and make those extra API calls configurable.

sylvain-reynaud commented 1 year ago

According to https://github.com/khast3x/h8mail#apis there are 3 free(ium) apis:

LeMoussel commented 1 year ago

For information, there is Fingerprint Suite with Playwright. It's OK with Antibot. I didn't test with Cloudflare.

sylvain-reynaud commented 1 year ago

It's OK with Antibot. I didn't test with Cloudflare.

const { chromium } = require('playwright');
const { FingerprintGenerator } = require('fingerprint-generator');
const { FingerprintInjector }  = require('fingerprint-injector');

(async () => {
    const fingerprintGenerator = new FingerprintGenerator();

    const browserFingerprintWithHeaders = fingerprintGenerator.getFingerprint({
        devices: ['desktop'],
        browsers: ['chrome'],
    });

    const fingerprintInjector = new FingerprintInjector();
    const { fingerprint } = browserFingerprintWithHeaders;

    const browser = await chromium.launch({ headless: false})

    // With certain properties, we need to inject the props into the context initialization
    const context = await browser.newContext({
        userAgent: fingerprint.userAgent,
        locale: fingerprint.navigator.language,
        viewport: fingerprint.screen,
    });

    // Attach the rest of the fingerprint
    await fingerprintInjector.attachFingerprintToPlaywright(context, browserFingerprintWithHeaders);

    const page = await context.newPage();

    await page.goto('https://haveibeenpwned.com/unifiedsearch/user@example.org');

    // wait for the page to load
    await page.waitForTimeout(20000);
    // log the page content
    console.log(await page.content());
    // screenshot the page
    await page.screenshot({ path: 'proof.png' });
})();

If it runs in headless it is blocked, if it runs with the browser window it is not blocked. You can check it with the code above.

I'll implement the paid API in first place.

LeMoussel commented 1 year ago

It seems OK in Firefox headless mode with this:

import path from 'path';
import { fileURLToPath } from 'url';

import { firefox } from 'playwright';
import { FingerprintGenerator } from 'fingerprint-generator';
import { FingerprintInjector } from 'fingerprint-injector';

(async () => {
    const fingerprintGenerator = new FingerprintGenerator();

    const browserFingerprintWithHeaders = fingerprintGenerator.getFingerprint({
        devices: ['desktop'],
        browsers: ['firefox'],
    });

    const fingerprintInjector = new FingerprintInjector();
    const { fingerprint } = browserFingerprintWithHeaders;

    const browser = await firefox.launch({
        headless: true
    });

    // With certain properties, we need to inject the props into the context initialization
    const context = await browser.newContext({
        userAgent: fingerprint.userAgent,
        locale: fingerprint.navigator.language,
        viewport: fingerprint.screen,
    });

    // Attach the rest of the fingerprint
    await fingerprintInjector.attachFingerprintToPlaywright(context, browserFingerprintWithHeaders);

    const page = await context.newPage();

    await page.goto('https://haveibeenpwned.com/unifiedsearch/user@example.org');

    await page.screenshot({ path: path.join(path.dirname(fileURLToPath(import.meta.url)), 'playwright_test_headless.png') });

    await browser.close()
})();
LeMoussel commented 1 year ago

Yep! It's OK with got-scraping got-scraping library has usually better success than other libraries due to header generation, http2 and browser ciphers.

import { gotScraping } from 'got-scraping';

(async () => {
    const response = await gotScraping({
        url: 'https://haveibeenpwned.com/unifiedsearch/user@example.org',
        headerGeneratorOptions:{
            browsers: ['firefox'],
            devices: ['desktop'],
        }
    });
    console.log(response.body)
    const result = JSON.parse(response.body)
    console.log(`Response headers: ${JSON.stringify(response.headers)}`);
})();
sylvain-reynaud commented 1 year ago

@LeMoussel wow I didn't know about this package, thank's :100:

So I'm working on adding the feature by calling this URL https://haveibeenpwned.com/unifiedsearch/user@example.org

sylvain-reynaud commented 1 year ago

Hello, my PR is ready to be reviewed :)

beshoo commented 1 year ago

What's is this API?

On Wed, Jan 11, 2023, 9:19 PM Sylvain Reynaud @.***> wrote:

Hello, my PR is ready to be reviewed :)

— Reply to this email directly, view it on GitHub https://github.com/reacherhq/check-if-email-exists/issues/289#issuecomment-1379370425, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDLT22QKOYZMRJJY3ZSVTLWR4BVPANCNFSM4MKR473Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>

sylvain-reynaud commented 1 year ago

I fixed the format and removed code that might break if a field is added on the API Response.

@beshoo it uses the haveibeenpwned API. The endpoint used is the one used by the front-end haveibeenpwned.com.

amaury1093 commented 1 year ago

The node.js libraries are probably more battle-tested, but I would like to keep this repo as pure Rust.

Also, I'm reluctant to use a headless browser for HIBP. It seems there's a risk that it'll become flaky/blocked one day, and the maintenance burden will likely fall on me. I propose to start with the paid API, as descrbied in https://github.com/reacherhq/check-if-email-exists/issues/289#issuecomment-1346149442. I'll gladly purchase the paid API and make it available on https://reacher.email 's SAAS plan.

amaury1093 commented 12 months ago

Implemented in #1253, closing, thanks @sylvain-reynaud