Open utterances-bot opened 1 year ago
Hi.
On the principle that all expansions should be quoted, to prevent word-split and glob to interfere, this line is a problem.
link_array=($(curl -s "$url" | awk -F 'href="' '/<a/{gsub(/".*/, "", $2); print $2}'))
I assume there is one item per line, so we should go for a mapfile instead.
mapfile -t link_array < <(curl ...)
Also, :
done
and echo
(syntax error).>> cnn_links.csv
can be moved right after the done
on the same line, this will prevent bash to re-open the file for each item in the array.sed -e 's/<title>//g' -e 's/<\/title>//g'
can be simplified as sed -e 's/<\/\?title>//g'
Hope this helps.
@Anvil That's really useful,
Thank you for sharing the insights. I still consider myself a beginner when it comes to bash
, awk
and sed
and I just learned about mapfile
today through your comment
WebScraping in Bash | Muhammad
Explore web scraping using Bash and CLI tools for efficient data extraction
https://muhammadraza.me/2023/webscraping-in-bash/