luiseduardobr1 / hathitrustPDF

Download an entire book (or publication) in PDF file from Hathi Trust Digital Library without "partner login" requirement
MIT License
43 stars 22 forks source link

the code is not working #10

Closed Git-Vasanth closed 1 month ago

Git-Vasanth commented 11 months ago

Screenshot (5152) Screenshot (5152) C:\Users\Vasanth\AppData\Local\Programs\Python\Python312\python.exe C:\Users\Vasanth\Hathi_downs\hathitrustPDF.py C:\Users\Vasanth\Hathi_downs\hathitrustPDF.py:18: SyntaxWarning: invalid escape sequence '\w' id_book = re.findall('id=(\w.\d)|$', link)[0] C:\Users\Vasanth\Hathi_downs\hathitrustPDF.py:62: SyntaxWarning: invalid escape sequence '\D' key=lambda x: (int(re.sub('\D', '', x)), x)) Traceback (most recent call last): File "C:\Users\Vasanth\Hathi_downs\hathitrustPDF.py", line 23, in pages_book = int(soup.find("section", {'class': 'd--reader--viewer'})['data-total-seq'])


TypeError: 'NoneType' object is not subscriptable

Process finished with exit code 1
smorello87 commented 9 months ago

Same here, it looks like the structure of the HathiTrust rendering page has changed

Traceback (most recent call last): File "hathitrustPDF.py", line 23, in pages_book = int(soup.find("section", {'class': 'd--reader--viewer'})['data-total-seq']) TypeError: 'NoneType' object is not subscriptable

ryanbugden commented 3 months ago

It looks like @Midnight145 made a bunch of progress in updating the code, but HathiTrust has updated how they render the pages. See the README info: https://github.com/Midnight145/hathitrustPDF/tree/master

Midnight145 commented 3 months ago

Yup. I'm not familiar enough with web development to continue it myself, but feel free to contribute to it if anyone has the knowhow to fix it.

Midnight145 commented 2 months ago

Actually, looking into this, I might be able to work around it. I'll try and figure something out and update my repo if I can find anything.

Midnight145 commented 2 months ago

Just pushed!! Realized I was wayyy overcomplicating things and I was trying to make things way harder than they needed to be. Only thing that was actually different was where the page count was located on the site.

ryanbugden commented 2 months ago

@Midnight145 Woo, nice, I'll check it out, thanks for looking into this!

Midnight145 commented 2 months ago

No problem! Please open an issue on my end if you run into any troubles!

ryanbugden commented 2 months ago

@Midnight145 Looks to be working well on my end! For the future, you may need to enable issues on your fork to keep this one clean though.

Just learned how:

Midnight145 commented 2 months ago

Whoops, didn't realize that wasn't enabled. Will fix that real quick!