lorenzodifuccia / safaribooks

Download and generate EPUB of your favorite books from O'Reilly Learning (aka Safari Books Online) library.
Do What The F*ck You Want To Public License
4.66k stars 690 forks source link

Truncated ( ... ) download still happening even after #133 fix #150

Closed mukuntharajaa closed 4 years ago

mukuntharajaa commented 5 years ago

I am on master branch and currently updated to Oct 14 2019 commit. Still I am seeing truncated chapter downloads.

Book id: 9781491908419

Chapter 2: Item 5: second page shows "..." and Item 6 is altogether missing.

Please let me know, if any further information is required.

brookscl commented 5 years ago

Same here.

vavdoshka commented 5 years ago

+1

spac-valentin commented 5 years ago

+1

phamhoangtuan commented 5 years ago

+1

sorinescu commented 5 years ago

+1

manfredlotz commented 5 years ago

I have the same issue

Azarakhsh commented 5 years ago

+1

ghistes commented 5 years ago

I have the same problem.

It seems to me that the problem is the login. Even though you get a 200 response-code when logging in, you never get a sessionid-cookie, and for that reason when requesting the chapters you are treated as if you are not logged in, resulting in the truncations - at least that's how it looked to me when I was trying to understand what is going on (not sure if it helps...).

elrob commented 5 years ago

The above PR fixes this issue for me

manfredlotz commented 5 years ago

Your fix worked fine for me too. Thanks a lot for your work!

manfredlotz commented 5 years ago

I tried a couple of downloads, and mostly the epubs are not really usable. @elrob : However, this is not the fault of your fix.

milktea02 commented 5 years ago

Still having issues :( even with #152

elrob commented 5 years ago

I tried a couple of downloads, and mostly the epubs are not really usable. @elrob : However, this is not the fault of your fix.

@manfredlotz This issue is about the truncation of output as if you're not logged in. The PR I created doesn't change anything in the epub creation. I have had no issues with three books I've since tested with. Definitely usable for me. Can you give me an example of a book you've had issues with? And what those issues are?

elrob commented 5 years ago

Still having issues :( even with #152

@milktea02 What issues are you having? Are they related to truncation (this github issue tracks the truncation problem)?

mukuntharajaa commented 5 years ago

Still having issues :( even with #152

@milktea02 What issues are you having? Are they related to truncation (this github issue tracks the truncation problem)?

I have tried the same book again ( 9781491908419 ). I am able to see contents now without ellipsis. But when I click chapter 6, it takes me to last page of chapter 6 properly, but shows chapter 5 as highlighted on the left hand side layout.

Guess this is some minor stuff.

elrob commented 5 years ago

I have tried the same book again ( 9781491908419 ). I am able to see contents now without ellipsis. But when I click chapter 6, it takes me to last page of chapter 6 properly, but shows chapter 5 as highlighted on the left hand side layout.

Guess this is some minor stuff.

@mukuntharajaa Thanks for the response. If it is an issue you would like to raise and get fixed then I recommend creating a new github issue for it. This github issue was around the truncation of chapters due to authentication issues. So for now, if/when @lorenzodifuccia accepts #152 then github issue would be fixed.

varta2014 commented 5 years ago

can we try this code please thank you

elrob commented 5 years ago

can we try this code please thank you

@varta2014 If you want to try my change before it is merged into this repository then you can just pull it from https://github.com/elrob/safaribooks

manfredlotz commented 5 years ago

@elrob Unfortunately, I don't remember which book download I tried. I know that FBReader crashed when opening the epub. The last downloads I did were ok.

milktea02 commented 5 years ago

Still having issues :( even with #152

@milktea02 What issues are you having? Are they related to truncation (this github issue tracks the truncation problem)?

@elrob Tried Clean Code (9780136083238) and still get truncation. I'm logging in via SSO if that might be the issue.

AsimShakour commented 5 years ago

I am having truncation with book: 9781119449270 in this area: https://learning.oreilly.com/library/view/professional-c-7/9781119449270/fintro.xhtml

Thanks

elrob commented 5 years ago

Still having issues :( even with #152

@milktea02 What issues are you having? Are they related to truncation (this github issue tracks the truncation problem)?

@elrob Tried Clean Code (9780136083238) and still get truncation. I'm logging in via SSO if that might be the issue.

@milktea02 I have updated my change to restore the code that I thought was unnecessary. It was unnecessary for me but I'm not using SSO. Maybe you can try the latest version of my branch and see if it works for you now. I don't have SSO so I can't test it myself.

@AsimShakour Are you using SSO too? Maybe that's the problem. Can you also try with the latest change I have made (updated just now).

varta2014 commented 5 years ago

elrob thank you code work perfect !

vikdean commented 5 years ago

Still having issues :( even with #152

@milktea02 What issues are you having? Are they related to truncation (this github issue tracks the truncation problem)?

@elrob Tried Clean Code (9780136083238) and still get truncation. I'm logging in via SSO if that might be the issue.

@milktea02 I have updated my change to restore the code that I thought was unnecessary. It was unnecessary for me but I'm not using SSO. Maybe you can try the latest version of my branch and see if it works for you now. I don't have SSO so I can't test it myself.

@AsimShakour Are you using SSO too? Maybe that's the problem. Can you also try with the latest change I have made (updated just now).

Just tested it with 9780135262047; SSO works, but it still downloads the books partially.

brookscl commented 5 years ago

For those of you still having trouble: delete the Books directory that is created for the downloads. Then retry your download. I found that the tool will not re-download chapters it thinks are already there. I was able to download book 9781119558439 without any problems. Not familiar, but it seemed complete.

vikdean commented 5 years ago

For those of you still having trouble: delete the Books directory that is created for the downloads. Then retry your download. I found that the tool will not re-download chapters it thinks are already there. I was able to download book 9781119558439 without any problems. Not familiar, but it seemed complete.

Tried it 3 times in a row, issue is still the same for 9780135262047

mukuntharajaa commented 4 years ago

For those of you still having trouble: delete the Books directory that is created for the downloads. Then retry your download. I found that the tool will not re-download chapters it thinks are already there. I was able to download book 9781119558439 without any problems. Not familiar, but it seemed complete.

Tried it 3 times in a row, issue is still the same for 9780135262047

I have also tried downloading this ebook and accessed random pages, @elrob`s fix is working fine.

vikdean commented 4 years ago

For those of you still having trouble: delete the Books directory that is created for the downloads. Then retry your download. I found that the tool will not re-download chapters it thinks are already there. I was able to download book 9781119558439 without any problems. Not familiar, but it seemed complete.

Tried it 3 times in a row, issue is still the same for 9780135262047

I have also tried downloading this ebook and accessed random pages, @elrob`s fix is working fine.

Check the Chapter beginnings... it only captures a couple of lines, the rest is truncated... Also, whats the epub size for you? Mine is 3MB

mukuntharajaa commented 4 years ago

For those of you still having trouble: delete the Books directory that is created for the downloads. Then retry your download. I found that the tool will not re-download chapters it thinks are already there. I was able to download book 9781119558439 without any problems. Not familiar, but it seemed complete.

Tried it 3 times in a row, issue is still the same for 9780135262047

I have also tried downloading this ebook and accessed random pages, @elrob`s fix is working fine.

Check the Chapter beginnings... it only captures a couple of lines, the rest is truncated... Also, whats the epub size for you? Mine is 3MB

Its 114M MB. I have checked random chapters for its beginnings and its end. I do not see any truncation. Your case could be different. While downloading add "--preserve-log" and then check for anything reported in that. Create a new issue if required.

vikdean commented 4 years ago

For those of you still having trouble: delete the Books directory that is created for the downloads. Then retry your download. I found that the tool will not re-download chapters it thinks are already there. I was able to download book 9781119558439 without any problems. Not familiar, but it seemed complete.

Tried it 3 times in a row, issue is still the same for 9780135262047

I have also tried downloading this ebook and accessed random pages, @elrob`s fix is working fine.

Check the Chapter beginnings... it only captures a couple of lines, the rest is truncated... Also, whats the epub size for you? Mine is 3MB

Its 114M MB. I have checked random chapters for its beginnings and its end. I do not see any truncation. Your case could be different. While downloading add "--preserve-log" and then check for anything reported in that. Create a new issue if required.

I dont know whats going on, but I did a fresh install, and the issue still exist for me... the generated .epub is 3.4MB still. Checked the log, its completely error-free. You can find it attached. log.txt

**Update: I've been running this on MacOS, so I've tried it on Ubuntu as well; same exact issue. Are you sure we are talking about the same book? 9780135262047 -- CCNP and CCIE Enterprise Core ENCOR 350-401 Official Cert Guide

elrob commented 4 years ago

I think there is still an issue for some people. Perhaps only those that use SSO. I guess there is another cookie or more cookies that are missing. I can't test this because I don't know what cookies are missing because it works for me.

Another PR has been created which might solve the issue for some people: https://github.com/lorenzodifuccia/safaribooks/pull/153 @McPatate seems to have found another cookie that might be getting lost. Maybe try that version if you're still having issues.

vikdean commented 4 years ago

I think there is still an issue for some people. Perhaps only those that use SSO. I guess there is another cookie or more cookies that are missing. I can't test this because I don't know what cookies are missing because it works for me.

Another PR has been created which might solve the issue for some people: #153 @McPatate seems to have found another cookie that might be getting lost. Maybe try that version if you're still having issues.

Tried that just now... same thing.

McPatate commented 4 years ago

I think there is still an issue for some people. Perhaps only those that use SSO. I guess there is another cookie or more cookies that are missing. I can't test this because I don't know what cookies are missing because it works for me. Another PR has been created which might solve the issue for some people: #153 @McPatate seems to have found another cookie that might be getting lost. Maybe try that version if you're still having issues.

Tried that just now... same thing.

Do you have the epub's id so we can give it a go ourselves ?

vikdean commented 4 years ago

I think there is still an issue for some people. Perhaps only those that use SSO. I guess there is another cookie or more cookies that are missing. I can't test this because I don't know what cookies are missing because it works for me. Another PR has been created which might solve the issue for some people: #153 @McPatate seems to have found another cookie that might be getting lost. Maybe try that version if you're still having issues.

Tried that just now... same thing.

Do you have the epub's id so we can give it a go ourselves ?

You mean the book id? Its 9780135262047

McPatate commented 4 years ago

Indeed, it doesn't work. I'm looking into what could be the problem. @elrob have you tried with that book id?

varta2014 commented 4 years ago

elrob code not work please fix thank you

McPatate commented 4 years ago

I think there is still an issue for some people. Perhaps only those that use SSO. I guess there is another cookie or more cookies that are missing. I can't test this because I don't know what cookies are missing because it works for me. Another PR has been created which might solve the issue for some people: #153 @McPatate seems to have found another cookie that might be getting lost. Maybe try that version if you're still having issues.

Tried that just now... same thing.

Do you have the epub's id so we can give it a go ourselves ?

You mean the book id? Its 9780135262047

It works with my code : https://github.com/McPatate/orly_book_extractor. The only problem is that it's pretty ugly 😓

varta2014 commented 4 years ago

McPatate yes code work thank you but you need fix some bug like: bookmark not exist ? and some error ... we wait for final code thanks

elrob commented 4 years ago

@varta2014 @vikdean I have investigated further and made some more changes that resolve another couple of issues with login. Can you try again with a fresh version of this: https://github.com/elrob/safaribooks Make sure to delete anything in the Books directory before trying.

vikdean commented 4 years ago

@varta2014 @vikdean I have investigated further and made some more changes that resolve another couple of issues with login. Can you try again with a fresh version of this: https://github.com/elrob/safaribooks Make sure to delete anything in the Books directory before trying.

I've just tried it; SSO authentication completely broken... it does not work at all. The only thing I get is this:

[18/Nov/2019 11:43:28] ** Welcome to SafariBooks! **
[18/Nov/2019 11:43:30] Authentication issue: unable to access profile page.
[18/Nov/2019 11:43:30] Last request done:
    URL: https://learning.oreilly.com/profile/
    DATA: None
    OTHERS: {}

    307
    server: istio-envoy
    cache-control: max-age=0
    content-type: text/plain; charset=utf-8
    location: /accounts/login/?next=%2Fprofile%2F
    x-envoy-upstream-service-time: 814
    x-powered-by: Express
    Accept-Ranges: bytes, bytes
    Content-Length: 70
    Date: Mon, 18 Nov 2019 10:43:30 GMT
    Via: 1.1 varnish
    Connection: keep-alive
    X-Client-IP: 188.143.125.75
    X-Served-By: cache-lcy19235-LCY
    X-Cache: MISS
    X-Cache-Hits: 0
    X-Timer: S1574073809.109248,VS0,VE945
    Vary: Accept,Accept, Accept-Encoding, Authorization, Cookie

Temporary Redirect. Redirecting to /accounts/login/?next=%2Fprofile%2F
elrob commented 4 years ago

@vikdean How are you attempting to use the script with SSO? I don't think this script will support SSO or ever has, except if you provide your own cookies.json. One of the recent changes I have made is to confirm the login before continuing with processing the book. Previously, the book processing would continue but then you'd just get a book with truncated chapters because the login had failed. Now it fails faster if there is an issue.

If you want to use the script with SSO, I think you need to do the following (I'll provide firefox instructions but it should also be possible with other browsers):

  1. Login in the browser as you normally would: https://learning.oreilly.com
  2. Access the profile page: https://learning.oreilly.com/profile/
  3. Open the developer tools: Press F12 in firefox
  4. At the bottom there is a console where you can type commands. Paste the following in there (the first time you do this it may ask you to allow pasting): var output = {};document.cookie.split(/\s*;\s*/).forEach(function(pair) {pair = pair.split(/\s*=\s*/);output[pair[0]] = pair.splice(1).join('=');});console.log(JSON.stringify(output)); (Credit to https://github.com/lorenzodifuccia/safaribooks/issues/2#issuecomment-429343521)
  5. Copy the JSON output and save it in a file called cookies.json in the same directory as the safaribooks code.
  6. Run the script without passing credentials: python3 safaribooks.py 9780135262047
vikdean commented 4 years ago

@vikdean How are you attempting to use the script with SSO? I don't think this script will support SSO or ever has, except if you provide your own cookies.json. One of the recent changes I have made is to confirm the login before continuing with processing the book. Previously, the book processing would continue but then you'd just get a book with truncated chapters because the login had failed. Now it fails faster if there is an issue.

If you want to use the script with SSO, I think you need to do the following (I'll provide firefox instructions but it should also be possible with other browsers):

1. Login in the browser as you normally would: https://learning.oreilly.com

2. Access the profile page: https://learning.oreilly.com/profile/

3. Open the developer tools: Press `F12` in firefox

4. At the bottom there is a console where you can type commands. Paste the following in there (the first time you do this it may ask you to `allow pasting`):
   `var output = {};document.cookie.split(/\s*;\s*/).forEach(function(pair) {pair = pair.split(/\s*=\s*/);output[pair[0]] = pair.splice(1).join('=');});console.log(JSON.stringify(output));`
   (Credit to [#2 (comment)](https://github.com/lorenzodifuccia/safaribooks/issues/2#issuecomment-429343521))

5. Copy the JSON output and save it in a file called `cookies.json` in the same directory as the safaribooks code.

6. Run the script without passing credentials: `python3 safaribooks.py 9780135262047`

Yes, that's exactly how I'm using it, right to the dot. I managed to start the script with sudo, however, the result is still truncated.

elrob commented 4 years ago

@vikdean I think I've found the problem. Using document.cookie from the console does not include the HttpOnly cookies and they are definitely required. I can't work out how to access these via the console but I was able to find a way to get them that isn't too painful.

  1. Login as usual to https://learning.oreilly.com/
  2. Open the developer tools with F12
  3. Go to Network tab in the developer tools
  4. Access the profile page in the browser: https://learning.oreilly.com/profile/
  5. In the Network tab, click on the request to /profile/ (it should be the first one)
  6. Click on the Cookies tab in the request information
  7. Right-click on the Request cookies text and choose Copy All
  8. Paste this into the cookies.json file and then remove the outer section of the JSON document
  9. Run the script without passing credentials: python3 safaribooks.py 9780135262047

p.s. sudo is not necessary.

vikdean commented 4 years ago

@vikdean I think I've found the problem. Using document.cookie from the console does not include the HttpOnly cookies and they are definitely required. I can't work out how to access these via the console but I was able to find a way to get them that isn't too painful.

1. Login as usual to https://learning.oreilly.com/

2. Open the developer tools with `F12`

3. Go to `Network` tab in the developer tools

4. Access the profile page in the browser: https://learning.oreilly.com/profile/

5. In the `Network` tab, click on the request to `/profile/` (it should be the first one)

6. Click on the `Cookies` tab in the request information

7. Right-click on the `Request cookies` text and choose `Copy All`

8. Paste this into the `cookies.json` file and then remove the outer section of the JSON document

9. Run the script without passing credentials: `python3 safaribooks.py 9780135262047`

p.s. sudo is not necessary.

Yes!!! Its working now, thanks a lot!

lorenzodifuccia commented 4 years ago

I pushed some changes, try with the last commit...

Thank you @elrob for your great job. 🎉 Cheers 🍺

lorenzodifuccia commented 4 years ago

News???

nkkarthik commented 4 years ago

Latest is working for me (verified with couple of books, that were being truncated before this push). Thank you @elrob @lorenzodifuccia

mukuntharajaa commented 4 years ago

News???

Checked with original book id and some new books. Working perfectly. Thanks.

OllieRobinson commented 4 years ago

Have the most recent commit, seems to get most books fine and haven't encountered many errors but there is a grey background to all books that never seemed to happen before. Also for coding books --no-kindle used to remove the scrollbar most of the time.

villancikos commented 4 years ago

@vikdean I think I've found the problem. Using document.cookie from the console does not include the HttpOnly cookies and they are definitely required. I can't work out how to access these via the console but I was able to find a way to get them that isn't too painful.

  1. Login as usual to https://learning.oreilly.com/
  2. Open the developer tools with F12
  3. Go to Network tab in the developer tools
  4. Access the profile page in the browser: https://learning.oreilly.com/profile/
  5. In the Network tab, click on the request to /profile/ (it should be the first one)
  6. Click on the Cookies tab in the request information
  7. Right-click on the Request cookies text and choose Copy All
  8. Paste this into the cookies.json file and then remove the outer section of the JSON document
  9. Run the script without passing credentials: python3 safaribooks.py 9780135262047

p.s. sudo is not necessary.

I think something is not right or just changed. I tried but this repo master and yours @elrob with no success. The Developer Tools Network tab inside the cookies section (profile page) won’t show any httpOnly cookie. Don’t know if this is just me.