rockdaboot / mget

Multithreaded metalink/file/website downloader (like Wget) and C library
GNU Lesser General Public License v3.0
112 stars 19 forks source link

domains & exclude domains wildcards #27

Closed ghost closed 9 years ago

ghost commented 9 years ago

-D --domains Comma-separated list of domains to follow. --exclude-domains Comma-separated list of domains NOT to follow. Please add to a these options wildcard or regex support. It's needed for -r option, for ie cdn1.domain.com cdn2.domain.com cdn3.domain.com & etc.

rockdaboot commented 9 years ago

It is there. If in doubt, try the 'develop' branch. I'll have to make a new release soon, I guess ;-)

$ src/mget --help|grep -i domains -D --domains Comma-separated list of domains to follow. --exclude-domains Comma-separated list of domains NOT to follow.

But yes, wildcards are not possible here (thanks to Wget compatibility).

You can try --accept/--reject with patterns. The check is not limited to the path, but also includes the domain. URLs will still be scanned (but using HEAD request first, and they not going to be saved).

But you are right, --domains/--exclude-domains should work with wildcards. I'll implement it in the next days.

rockdaboot commented 9 years ago

Wildcard support is now in branch 'develop' (done by fnmatch). Also, international domain names (IDN) are supported. The input might be percent-encoded. Case does not matter. Example: --domain="x.example.com,was..übel.de,ex?mple.com"