nianeyna / ao3downloader

Utility for downloading fanfiction in bulk from the Archive of Our Own
GNU General Public License v3.0
175 stars 15 forks source link

"download links from file" keeps looking for more pages of series without stopping if the link has a space at the end #74

Closed WhyDidIHaveToDoThis closed 1 year ago

WhyDidIHaveToDoThis commented 1 year ago

Ignore everything below the line I figured it out!!!

When leaving a space at the end of the link ( \n) the script will try to download pages when fetching the other works in a series. This is the result of user error not a bug, because of the way I generated the list of links. This does not need to be fixed if a warning is added for users.

But I do have a suggestion for a setting to set a page limit, I keep sending ao3 100 requests for empty pages and getting temp banned all the time.


I encountered an error when downloading a list of links. Here is what my log file said:

%%[idk where the log starts, this is mid execution]%%
{"starting": "https://archiveofourown.org/works/7269544 \n", "timestamp": "02/05/2023, 22:34:43"}
{"starting": "https://archiveofourown.org/works/3204764 \n", "timestamp": "02/05/2023, 22:34:43"}
{"link": "https://archiveofourown.org/works/3126128", "title": "3126128 Way Better Than Flowers - orphan_account", "workskin": false, "success": true, "timestamp": "02/05/2023, 22:34:45"}
{"series": "Sterek Tumblr Prompts", "link": "https://archiveofourown.org/works/2537945", "title": "2537945 French Silk Pie, Baby - orphan_account", "workskin": false, "success": true, "timestamp": "02/05/2023, 22:34:49"}
%%[the rest of the series]%%
{"link": "https://archiveofourown.org/works/3469925", "title": "3469925 Down the Rabbit Hole - orphan_account", "workskin": false, "success": true, "timestamp": "02/05/2023, 22:35:10"}
{"starting": "https://archiveofourown.org/works/3204764 \n?page=2", "timestamp": "02/05/2023, 22:35:10"}
{"starting": "https://archiveofourown.org/works/3204764 \n?page=3", "timestamp": "02/05/2023, 22:35:10"}
{"starting": "https://archiveofourown.org/works/3204764 \n?page=4", "timestamp": "02/05/2023, 22:35:11"}
{"starting": "https://archiveofourown.org/works/3204764 \n?page=5", "timestamp": "02/05/2023, 22:35:11"}
{"starting": "https://archiveofourown.org/works/3204764 \n?page=6", "timestamp": "02/05/2023, 22:35:12"}
%%[i had to quit the program]%%

um don't judge the works i just had random links in my OneTab for some reason, i jut exported them and fixed the links with csv so i could merge them all in calibre

This happened twice with these two links before I gave up and manually downloaded all the works. ;(

I tried to look through the code to fix this myself but having never seen python code I gave up after an hour of trying to figure it out... but I don't really understand why it did this? My test with two random works seemed to be fine; the second work was second in the series and the first one got downloaded too. I think it was simply because the series was too long?

{"link": "https://archiveofourown.org/works/4371455\n", "title": "4371455\n hello goodbye ('twas nice to know you) - tamerofdarkstars", "workskin": false, "success": true, "timestamp": "02/05/2023, 23:17:33"}
{"link": "https://archiveofourown.org/works/234222", "title": "234222 Then Comes a Mist and a Weeping Rain - Faith Wood (faithwood)", "workskin": false, "success": true, "timestamp": "02/05/2023, 23:17:36"}
{"link": "https://archiveofourown.org/works/12507420", "title": "12507420 right hand to god - rohkeutta", "workskin": false, "success": true, "timestamp": "02/05/2023, 23:51:58"}

But what I attempted to do to fix the issue was to get the prompt from "download from ao3 link" to show where I could set a limit on the pages downloaded, but that would be a janky solution anyways. Downloading a series shouldn't try to download pages in the first place.... wait what's that new line character doing there...

nianeyna commented 1 year ago

Thank you for this bug report! Here is your prize:

yay

Congratulations. It should be fixed now! Little bit of a silly mistake on my part - I should know better than to forget to strip whitespace on user input, really. No need to set a page limit (that would annoy the people who are trying to download thousands of pages, I am sure)

WhyDidIHaveToDoThis commented 1 year ago

note to self: clean up your links files, anything extraneous will cause this issue e.g. / or #something or ?view_adult=true some works in the format https://archiveofourown.org/works/number/chapters/number will cause the issue idk but the chapter/number part isn't necessary to download the work