simpleton / APKMirro_crawl

3 stars 0 forks source link

Request for adding more metadata #1

Open juliankrz opened 6 years ago

juliankrz commented 6 years ago

Hi, I love this script.

I have a request for adding some additional metadata to the output to help with some data aggregation I'd like to do.

  1. Could we have full application name as some Instagram apps are marked as "alpha": https://www.apkmirror.com/apk/instagram/instagram-instagram/instagram-instagram-37-0-0-0-67-96271-release/

You already include "Beta" string for Snapchat which is great, could you have this for other apps too? https://www.apkmirror.com/apk/snap-inc/snapchat/snapchat-10-27-5-0-beta-release/

  1. Some apk versions have different flavors such as "arm ..." - could we also add this as an additional field "flavor": https://www.apkmirror.com/apk/instagram/instagram-instagram/instagram-instagram-35-0-0-20-96-95414-release/instagram-35-0-0-20-96-2-android-apk-download/

Same for Facebook - they have "alpha" and "beta" and flavors: https://www.apkmirror.com/apk/facebook-2/facebook/facebook-164-0-0-9-95-release/facebook-164-0-0-9-95-android-apk-download/

Thanks!

simpleton commented 6 years ago

This feature is almost done. But it's easy to cause HTTP 429 error. Because of we access two page for every single variant, maybe it's too much.

I found some proxy solutions to avoid 429 issue: https://github.com/aivarsk/scrapy-proxies https://github.com/fabienvauchelles/scrapoxy

juliankrz commented 6 years ago

Thanks a lot! I will into those proxy solutions this week.

fabienvauchelles commented 5 months ago

Hi, Scrapoxy 4 is out!

Scrapoxy is a open source proxy aggregator, allowing you to manage all proxies in one place 🎯, rather than spreading it across multiple scrapers πŸ•ΈοΈ.

Smartly designed for efficient traffic routing πŸ”€, Scrapoxy minimizes #bans and boosts success rates πŸš€.

The tech stack is built on the latest NodeJS, Typescript, utilizing the NestJS and Angular frameworks.

Here are the key features:

Checkout https://scrapoxy.io/ !