Closed Git-Vasanth closed 1 month ago
Same here, it looks like the structure of the HathiTrust rendering page has changed
Traceback (most recent call last):
File "hathitrustPDF.py", line 23, in
It looks like @Midnight145 made a bunch of progress in updating the code, but HathiTrust has updated how they render the pages. See the README info: https://github.com/Midnight145/hathitrustPDF/tree/master
Yup. I'm not familiar enough with web development to continue it myself, but feel free to contribute to it if anyone has the knowhow to fix it.
Actually, looking into this, I might be able to work around it. I'll try and figure something out and update my repo if I can find anything.
Just pushed!! Realized I was wayyy overcomplicating things and I was trying to make things way harder than they needed to be. Only thing that was actually different was where the page count was located on the site.
@Midnight145 Woo, nice, I'll check it out, thanks for looking into this!
No problem! Please open an issue on my end if you run into any troubles!
@Midnight145 Looks to be working well on my end! For the future, you may need to enable issues on your fork to keep this one clean though.
Just learned how:
Whoops, didn't realize that wasn't enabled. Will fix that real quick!
C:\Users\Vasanth\AppData\Local\Programs\Python\Python312\python.exe C:\Users\Vasanth\Hathi_downs\hathitrustPDF.py C:\Users\Vasanth\Hathi_downs\hathitrustPDF.py:18: SyntaxWarning: invalid escape sequence '\w' id_book = re.findall('id=(\w.\d)|$', link)[0] C:\Users\Vasanth\Hathi_downs\hathitrustPDF.py:62: SyntaxWarning: invalid escape sequence '\D' key=lambda x: (int(re.sub('\D', '', x)), x)) Traceback (most recent call last): File "C:\Users\Vasanth\Hathi_downs\hathitrustPDF.py", line 23, in
pages_book = int(soup.find("section", {'class': 'd--reader--viewer'})['data-total-seq'])