Closed Montana closed 6 months ago
Overall I like the patch but we need to clarify bunch of issues:
Could you explain why is tr -d '/\r\n | sed "s/\// /g
more effective than whatever we have now?
Other points:
echo -n
was important to fix #4. Why was it removed?wait
needs to be part of every while loop as a form of throttling, otherwise script can easily spawn thousands simultaneous processes and connectionsprint_usage
function and instead inline it like it was before, I prefer less indirectionprintf "$i/$limit : $newname\n"
is more readable than printf "%d/%d : %s\n" "$i" "$limit" "$newname"
. Or replace with echo
Hi @ostrolucky,
The tr -d '/\r\n'
command ensures that any potentially problematic characters (slashes, carriage returns, newlines) are removed from the title, making the filenames safer and less likely to cause errors when saving, secondly combining tr
and sed
streamlines the sanitization process, reducing the risk of missed problematic characters that might be overlooked by sed
alone.
In other words filenames without slashes, carriage returns, and newlines are more likely to be accepted by the file system, avoiding issues with invalid characters.
So the current approach in this script is:
newname=`echo $name | sed "s/^\///;s/\// /g"`_"$subreddit"_$id.$ext
The approach I took was the following:
newname=`echo $name | tr -d '/\r\n' | sed 's/\// /g'`_"$subreddit"_$id.$ext
As you requested I've removed print_usage
:
#!/bin/bash
#cfg
subreddit=$1
sort=$2
top_time=$3
limit=$4
if [ "$1" == "-h" ] || [ -z $subreddit ]; then
echo "Usage: $0 SUBREDDIT [hot|new|rising|top|controversial] [all|year|month|week|day] [limit]
Examples: $0 starterpacks new week 10
$0 funny top all 50"
exit 0;
fi
if [ -z $sort ]; then
sort="hot"
fi
if [ -z $top_time ];then
top_time=""
fi
if [ -z $limit ]; then
limit=0
fi
url="https://www.reddit.com/r/$subreddit/$sort/.json?raw_json=1&t=$top_time"
content=`curl $url`
mkdir -p $subreddit
i=1
while : ; do
urls=$(echo -n "$content"| jq -r '.data.children[]|select(.data.post_hint|test("image")?) | .data.preview.images[0].source.url')
names=$(echo -n "$content"| jq -r '.data.children[]|select(.data.post_hint|test("image")?) | .data.title')
ids=$(echo -n "$content"| jq -r '.data.children[]|select(.data.post_hint|test("image")?) | .data.id')
a=1
for url in $urls; do
name=`echo -n "$names"|sed -n "$a"p`
id=`echo -n "$ids"|sed -n "$a"p`
ext=`echo -n "${url##*.}"|cut -d '?' -f 1 | sed 's/gif/png/' `
newname=`echo $name | tr -d '/\r\n' | sed 's/\// /g'`_"$subreddit"_$id.$ext
printf "$i/$limit : $newname\n"
curl --retry 3 --no-clobber --output "$subreddit/$newname" $url &>/dev/null &
((a=a+1))
((i=i+1))
if [ $i -gt $limit ] ; then
wait
exit 0
fi
done
wait
after=$(echo -n "$content"| jq -r '.data.after//empty')
if [ -z $after ]; then
break
fi
url="https://www.reddit.com/r/$subreddit/$sort/.json?count=200&after=$after&raw_json=1&t=$top_time"
content=`curl --retry 3 $url`
done
So inside the for loop
I've placed each image download command (curl
) that runs in the background (&
). The wait
command ensures that all background processes finish before the script exits if the limit is reached.
Cheers, Michael Mendy
The tr -d '/\r\n' command ensures that any potentially problematic characters (slashes, carriage returns, newlines) are removed from the title
As far as I know, so does | sed "s/^\///;s/\// /g"
. Or am I missing something?
The tr -d '/\r\n' command ensures that any potentially problematic characters (slashes, carriage returns, newlines) are removed from the title
As far as I know, so does
| sed "s/^\///;s/\// /g"
. Or am I missing something?
This command is useful when you want to remove specific characters from the text, so the following sed
command would be just that:
So let's start with tr -d '/\r\n'
:
tr -d '/\r\n':
/, \r,
and \n
characters.To move to sed "s/^\///;s/\// /g"
:
/
and replaces all other /
characters with spaces.\r
and \n characters.I think the confusion is, these commands serve different purposes and are not interchangeable. Use tr
when you need to delete specific characters, and use sed
when you need to perform pattern-based text replacements. So you'll notice when I added to the pull request:
newname=`echo $name | tr -d '/\r\n' | sed 's/\// /g'`_"$subreddit"_$id.$ext
This was a dual use case, to make sure all these functions were being ran.
Cheers, Michael Mendy
Ok thanks! I've incorporated these improvements with ab7df66e8b4aa6ac238ba28227ac0ea0fa154b20
yo thanks, script runs a lot smoother @Montana
Replaced
while : ; do
withwhile true; do:
This makes the infinite loop more readable.Enforced title sanitization more effectively: Using
tr -d '/\r\n'
andsed "s/\// /g"
to remove any potential newlines and slashes from titles.Added parentheses around
i
and limit comparison: This is more readable and in line with typical Bash script practices.Improved script functionality altogether.