the parser doesn't work fine in many sites especially forums

openproxyspace / unfx-proxy-parser

Unfx Proxy Parser - Nextgen proxy parser with deep links crawler. Follow to internal links, third-party links. Sorting results by countries.

https://openproxy.space/software/proxy-parser

MIT License

50 stars 20 forks source link

the parser doesn't work fine in many sites especially forums #1

Open SpearRipper opened 5 years ago

SpearRipper commented 5 years ago

hello @assnctr

I tried your proxy parser, and I can say it's the best Proxy Parser I ever found.

But they're a problem that the parser doesn't scrape proxy from many sites If any site has the proxies like this the parser don't scrape them example: 113.120.189.184 | 9999 | 高匿名 | HTTP | 山东省济宁市电信 | 3秒 | 2019-01-05 23:30:59 124.94.196.188 | 9999 | 高匿名 | HTTP | 辽宁省阜新市联通 | 1秒 | 2019-01-05 22:30:59 110.52.235.76 | 9999 | 高匿名 | HTTP | 湖南省岳阳市联通 | 0.3秒 | 2019-01-05 21:30:56 39.137.107.98 | 80 | 高匿名 | HTTP | 中国移动 | 2秒 | 2019-01-05 20:30:58

Also is there a way to make your parser take all the proxies from http://proxydb.net/ without going so heavy or take ages? because I tried all and it won't work too.

I hope you fix this and make it more advanced when it scrapes these types

Overall great work and thank you In #Advance.

SpearRipper commented 5 years ago

here some sites won't get any proxies from them

http://nntime.com/ https://hidemyna.me/ https://www.my-proxy.com/ https://www.proxynova.com/ https://premproxy.com/

relloccate commented 5 years ago

Hi, thanks.

Old v 1.3.0 BETA parsed sites such as: https://hidemyna.me http://proxydb.net e.t.c

All these sites required JS, in 1.3.0 was Headless chrome with cloudflare bypassing. May be i add this in next patches.

About: 113.120.189.184 | 9999 | 高匿名 | HTTP | 山东省济宁市电信 | 3秒 | 2019-01-05 23:30:59

Proxies parse with simple regex (ip:port), but i add this in next patch.

SpearRipper commented 5 years ago

Hi, thanks.

Old v 1.3.0 BETA parsed sites such as: https://hidemyna.me http://proxydb.net e.t.c

All these sites required JS, in 1.3.0 was Headless chrome with cloudflare bypassing. May be i add this in next patches.

About: 113.120.189.184 | 9999 | 高匿名 | HTTP | 山东省济宁市电信 | 3秒 | 2019-01-05 23:30:59

Proxies parse with simple regex (ip:port), but i add this in next patch.

**tbh this version is great but now u mention old version have the ability to parse from such site like this, it will be great if the current update apply to parse from all sites even the proxies with port without ":" since most of the popular sites share their proxies as IP and port without :

also, I don't see the download link for the previous versions

can't wait to see the next update, Thank You.**

SpearRipper commented 5 years ago

hello @assnctr can you update the Proxy Parser to be able to scrape proxies with type of ip port like the sites i sent you above

http://nntime.com/ https://hidemyna.me/ https://www.my-proxy.com/ https://www.proxynova.com/ https://premproxy.com/ http://proxydb.net

i searched for v 1.3.0 BETA and i couldn't fight a download link :(