nacimgoura / instagram-profilecrawl

:computer: Quickly crawl the information (e.g. followers, tags, etc...) of an instagram profile. No login required!
MIT License
121 stars 30 forks source link

crawl only shared_data #31

Open Stiveknx opened 6 years ago

Stiveknx commented 6 years ago

Well, more like an suggestion, than a "bug" report.

I think you shouldn't load the full instagram page. First, the element classes (inside elements.json), change with some frequency (have no ideia wich frequency is that).

So, my sugestion it 's just load _sharedData inside the profile. Don't load javascripts, images, styles.. It's way faster.

Something like this:

        this.page = await this.browser.newPage();
        await this.page.setRequestInterception(true);
        this.page.on('request', (request) => {
            if (['image', 'stylesheet', 'font', 'script'].indexOf(request.resourceType()) !== -1) {
                request.abort();
            } else {
                request.continue();
            }
        });
        await this.page.setExtraHTTPHeaders({
            'Accept-Language': 'pt-BR'
        });
        await this.page.goto('https://instagram.com/' + username, {
            waitUntil: 'networkidle0'
        });
        const sharedData = document.querySelector('script').innerText;
        const html = /window._sharedData = (.*);/.exec(sharedData)[1];
        const profileData = JSON.parse(html);

/* Maybe here you could use your version 1.0 "parseData" function from here ? */
Stiveknx commented 6 years ago

I ended up doing this to my project here, based on what you wrote.

If you want I can fork and send a PR. Just let me know.

https://gist.githubusercontent.com/Stiveknx/86342c6588371010a30d8239e07df0ad/raw/a11780c36f28b51958501e11285f05bb2ed1b2e0/profilecrawl.ts

nacimgoura commented 6 years ago

thanks for your comment, it's interesting and I hadn't thought about it. Do a PR and then I'll watch.