nianeyna / ao3downloader

Utility for downloading fanfiction in bulk from the Archive of Our Own
GNU General Public License v3.0
188 stars 17 forks source link

Work links in summaries should not be included in the download #50

Closed nianeyna closed 1 year ago

nianeyna commented 2 years ago

Regression caused by #43. Previously, the downloader only recognized internal/relative work links (of the form /works/12345). Now, work links of the form [...]/works/12345 are included also. This has the unintended side effect of including absolute work links (https://archiveofourown.org/works/12345) in the list. This can happen when an author manually links to another work on ao3 in their summary. Aside from not being intended behavior, the ao3 root url is still prepended to the absolute links (https://archiveofourown.orghttps://archiveofourown.org/works/12345) causing mayhem.

tl;dr amend the work and series patterns to recognize internal (starting with "/") links only.

nianeyna commented 2 years ago

Expressions are in strings.py and are named AO3_WORK and AO3_SERIES

Work pattern should match:

/works/12345 /collections/CollectionName/works/12345

Should not match:

https://archiveofourown.org/works/12345 works/12345 /fooworks/12345

Series expression should follow the same rules but with "series" instead of "works"