Closed Zabrane closed 5 months ago
@Zabrane could you try v1.95.19
and see if the issues persist? For pre-built binaries it is possible. This crate has a lot of features so using a pre-build binary would require enabling every feature and making sure their is a call for each feature workflow difference across runs. This would need a lot of documentation. Feel free to push a PR.
@Zabrane could you try
v1.95.19
and see if the issues persist? For pre-built binaries it is possible. This crate has a lot of features so using a pre-build binary would require enabling every feature and making sure their is a call for each feature workflow difference across runs. This would need a lot of documentation. Feel free to push a PR.
https://github.com/axodotdev/cargo-dist should be able to handle the reqs for the deployment.
@j-mendez testing now.
@j-mendez could help please?
❯ git checkout v1.95.19
HEAD is now at 4b8a604 chore(deps): bump chromiumoxide@0.6.0
❯ cargo install spider_cli
Updating crates.io index
Ignored package `spider_cli v1.95.22` is already installed, use --force to override
❯ ~/.cargo/bin/spider --version
spider_cli 1.95.22
❯ ~/.cargo/bin/spider -a --verbose --url "https://rsseau.fr"
❯
Why a checkout of v1.95.19
is building a newer version 1.95.22
?
As you can see, it still doesn't work.
@j-mendez could help please?
❯ git checkout v1.95.19 HEAD is now at 4b8a604 chore(deps): bump chromiumoxide@0.6.0 ❯ cargo install spider_cli Updating crates.io index Ignored package `spider_cli v1.95.22` is already installed, use --force to override ❯ ~/.cargo/bin/spider --version spider_cli 1.95.22 ❯ ~/.cargo/bin/spider -a --verbose --url "https://rsseau.fr" ❯
Why a checkout of
v1.95.19
is building a newer version1.95.22
? As you can see, it still doesn't work.
You need a command after the url. Here is the list, that can be found using spider --help
.
The fastest web crawler CLI written in Rust.
Usage: spider [OPTIONS] --url <URL> [COMMAND]
Commands:
crawl Crawl the website extracting links
scrape Scrape the website extracting html and links
download Download html markup to destination
help Print this message or the help of the given subcommand(s)
Accidental close.
@j-mendez why the --limit <LIMIT>
isn't respected?
If for example i run a crawl with --limit 5
, it keeps running forever.
Similarly, --depth 1
get me the first page, but --depth 2
runs forever.
fixed in v1.95.23
Thanks again.
@j-mendez
I don't know which version you're using, but neither --limit
nor --depth
is working as expected :-/
> spider --version
spider_cli 1.95.22
> ❯ spider --verbose --limit 3 --url "https://choosealicense.com/" crawl
[2024-05-30T05:19:09Z INFO spider::utils] fetch - https://choosealicense.com/terms-of-service/
[2024-05-30T05:19:09Z INFO spider::utils] fetch - https://choosealicense.com/about/
[2024-05-30T05:19:09Z INFO spider::utils] fetch - https://choosealicense.com/licenses/mit/
[2024-05-30T05:19:09Z INFO spider::utils] fetch - https://choosealicense.com/non-software/
[2024-05-30T05:19:09Z INFO spider::utils] fetch - https://choosealicense.com/community/
[2024-05-30T05:19:09Z INFO spider::utils] fetch - https://choosealicense.com/licenses/
[2024-05-30T05:19:09Z INFO spider::utils] fetch - https://choosealicense.com/no-permission/
[2024-05-30T05:19:09Z INFO spider::utils] fetch - https://choosealicense.com/licenses/unlicense/
[2024-05-30T05:19:09Z INFO spider::utils] fetch - https://choosealicense.com/licenses/isc/
[2024-05-30T05:19:09Z INFO spider::utils] fetch - https://choosealicense.com/appendix/
[2024-05-30T05:19:09Z INFO spider::utils] fetch - https://choosealicense.com/licenses/isc
[2024-05-30T05:19:09Z INFO spider::utils] fetch - https://choosealicense.com/appendix
[2024-05-30T05:19:09Z INFO spider::utils] fetch - https://choosealicense.com/licenses/unlicense
[2024-05-30T05:19:09Z INFO spider::utils] fetch - https://choosealicense.com/licenses/mit-0
[2024-05-30T05:19:09Z INFO spider::utils] fetch - https://choosealicense.com/licenses/zlib
[2024-05-30T05:19:09Z INFO spider::utils] fetch - https://choosealicense.com/licenses/ms-pl
[2024-05-30T05:19:09Z INFO spider::utils] fetch - https://choosealicense.com/licenses/bsd-2-clause/
[2024-05-30T05:19:09Z INFO spider::utils] fetch - https://choosealicense.com/licenses/bsd-3-clause-clear
[2024-05-30T05:19:09Z INFO spider::utils] fetch - https://choosealicense.com/licenses/ncsa
[2024-05-30T05:19:09Z INFO spider::utils] fetch - https://choosealicense.com/licenses/bsd-2-clause
[2024-05-30T05:19:09Z INFO spider::utils] fetch - https://choosealicense.com/licenses/mit
[2024-05-30T05:19:09Z INFO spider::utils] fetch - https://choosealicense.com/licenses/bsd-2-clause-patent
[2024-05-30T05:19:09Z INFO spider::utils] fetch - https://choosealicense.com/licenses/bsd-4-clause
[2024-05-30T05:19:09Z INFO spider::utils] fetch - https://choosealicense.com/licenses/vim
[2024-05-30T05:19:09Z INFO spider::utils] fetch - https://choosealicense.com/licenses/ms-rl
[2024-05-30T05:19:09Z INFO spider::utils] fetch - https://choosealicense.com/licenses/postgresql
[2024-05-30T05:19:09Z INFO spider::utils] fetch - https://choosealicense.com/licenses/bsd-3-clause
[2024-05-30T05:19:09Z INFO spider::utils] fetch - https://choosealicense.com/licenses/wtfpl
[2024-05-30T05:19:09Z INFO spider::utils] fetch - https://choosealicense.com/licenses/0bsd
[2024-05-30T05:19:10Z INFO spider::utils] fetch - https://choosealicense.com/licenses/ms-pl/
[2024-05-30T05:19:10Z INFO spider::utils] fetch - https://choosealicense.com/licenses/bsd-3-clause/
[...] continues forever !!!!
Working now
$ spider --version
spider_cli 1.95.23
$ spider --verbose --limit 3 --url "https://choosealicense.com/" crawl
[2024-05-30T07:46:59Z INFO spider::utils] fetch - https://choosealicense.com/
[2024-05-30T07:46:59Z INFO spider::utils] fetch - https://choosealicense.com/community/
[2024-05-30T07:46:59Z INFO spider::utils] fetch - https://choosealicense.com/non-software/
Some artifacts are available here https://github.com/spider-rs/spider/releases/tag/v1.95.27. The next release will have musl support.
Hi guys,
This project is simply awesome. I've tried a bunch of spiders in
Go
andNodeJS
but they are slow. However, I'm getting an error when trying to compile it on mymacOS Intel
:Would it be possible to prepare pre-built binaries to simplify its usage?
I succeeded building it from source on
Ubuntu 22.04
. But it doesn't seem to work:Many thanks