Open AmyJoseph opened 1 year ago
underscore fixed. Character replacement function I added after getting this types of names (?url=https%3A%2F%2Fcovid19.govt.nz%2Fassets%2FProactive-Releases%2Fproactive-release%2FWorking-for-Families-Tax-credits-entitlement-for-Emergency-Benefit-recipients.pdf&data=05%7C01%7Csamantha.putt%40ird.govt.nz%7C3e2280a057784af95c9008db2fc4e134%7Cfb39e3e923a9404e93a2b42a87d94f35%7C1%7C0%7C638156294857350068%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=BqnprTsiumL1S7tfQU03lDPn1HJ%2F1hQNMB2rJefoq8A%3D&reserved=0)
When a URL contains query parameters after the ',pdf' in a URL, the final filename ends up being 'example.pdf'. I can see that the underscores are being swapped in for invalid chars in line 81 of
bulk_pdf_downloader.py
, but I'm not sure which invalid characters they are replacing in the first place.