mraza007 / blog

Currently hosted on vercel
https://blog-vert-iota.now.sh
MIT License
0 stars 0 forks source link

2023/webscraping-in-bash/ #6

Open utterances-bot opened 1 year ago

utterances-bot commented 1 year ago

WebScraping in Bash | Muhammad

Explore web scraping using Bash and CLI tools for efficient data extraction

https://muhammadraza.me/2023/webscraping-in-bash/

Anvil commented 1 year ago

Hi.

On the principle that all expansions should be quoted, to prevent word-split and glob to interfere, this line is a problem.

link_array=($(curl -s "$url" | awk -F 'href="' '/<a/{gsub(/".*/, "", $2); print $2}'))

I assume there is one item per line, so we should go for a mapfile instead.

mapfile -t link_array < <(curl ...)

Also, :

Hope this helps.

mraza007 commented 1 year ago

@Anvil That's really useful, Thank you for sharing the insights. I still consider myself a beginner when it comes to bash, awk and sed

and I just learned about mapfile today through your comment