shihabmridha / educative.io-downloader

Free Palestine. 📖 This tool is to download course from educative.io for offline usage. It uses your login credentials and download the course.
218 stars 130 forks source link

mHTML fails to show svg+xml image while pdf format works fine #28

Closed the-redback closed 4 years ago

the-redback commented 4 years ago

Thanks for this useful tool.

As the title says, while downloading a course, the mhtml fails to show an image (possibly svg+xml), but the pdf works fine.

I ran in headless=false mode, the image load successfully. But, after downloading, it was blank in mhtml file.

This is an example page, https://www.educative.io/courses/advanced-kubernetes-techniques/3YzvvGX3RxA

Thanks.

devKshitijJain commented 4 years ago

@the-redback it's a known issue and images are not being rendered by MHTML. For now you can inspect element and you will see image url in it that you can open manually. Meanwhile we will try to fix it.

devKshitijJain commented 4 years ago

Idea to fix this issue is to get SVG images, convert them to data URL and then use img instead of svg.

var svgElement = document.getElementById('svgId');

// Create your own image var img = document.createElement('img');

// Serialize the svg to string var svgString = new XMLSerializer().serializeToString(svgElement);

// Remove any characters outside the Latin1 range var decoded = unescape(encodeURIComponent(svgString));

// Now we can use btoa to convert the svg to base64 var base64 = btoa(decoded);

var imgSource = `data:image/svg+xml;base64,${base64}\`;

img.setAttribute('src', imgSource);

at the end just replace that svg with img.

the-redback commented 4 years ago

I want to point another issue with pdf saves. While unfolding the slide, most of the images saved as blank in pdf. After investigation, I found that pages are considered done before the new images load. So, it stays blank. Although I workaround adding a sleep time before saving the pdf, it is a hacky solution that doesn't guarantee that images are loaded and also waits for every page without considering if there is any slide or not. I hope you guys can come up with a better solution.

Thanks again. <3

devKshitijJain commented 4 years ago

As images are lazy loading in nature. A simple solution would be to click and then smooth scroll to element, which will ensure the images are loaded. But in that case it will mess up with the code blocks as monaco editor will load and that will create issue with rendering of code blocks. I have a script that blocks it but need to check. Does it work fine with MHTML?

the-redback commented 4 years ago

Last time I checked, It was also blank in MHTML. But, I am not sure if it due to lazy loading or svg problem.

the-redback commented 4 years ago

BTW, I have no knowledge of JS or nodejs. Never tried hello world printing either. :D I want to know, is there any way to add waiting time in this block? https://github.com/shihabmridha/educative.io-downloader/blob/b048bf19e75328896fb5221bc737929e1ab629a5/src/download.ts#L150-L153

I have used few solutions from online which doesn't work. Also, is there any way to waitfornavigation after element.click()?

devKshitijJain commented 4 years ago

@shihabmridha for reference we need help from here - https://stackoverflow.com/a/46639473

devKshitijJain commented 4 years ago

@the-redback you can use this after element.click();

//Scrolling to the element clicked    
element.scrollIntoView({behavior: "smooth"});
//Wait for 1000ms
await new Promise(function(resolve) {setTimeout(resolve, 1000);});
shihabmridha commented 4 years ago

@the-redback, FYI: Any function you pass to page.evaluate(cb, arg) (for example our pageEvaluation()) gets executed in browser not in our script/node process. So, if you write console.log to see whats going on inside that block you will not see anything in the console.

What I am trying to say is, any code you write in there you need to make sure it runs in chrome. I don't know client-side JavaScript that much so I have no idea how to wait for network call to resolve. We can not use any puppeteer feature there.

@devKshitijJain your code to wait for 1000ms might not work in this case because, you can only await a function inside an async function. In our case pageEvaluation is not an async function.

So, instead of this: https://github.com/shihabmridha/educative.io-downloader/blob/b048bf19e75328896fb5221bc737929e1ab629a5/src/download.ts#L84

you @the-redback can try this: _const languages = await page.evaluate(`pageEvaluation`, { SAVE_AS, SAVE_LESSONAS });

and on line 147 add async keyword infront of the function. async function pageEvaluation({ SAVE_AS, SAVE_LESSON_AS }) { ... }

Related issue in puppeteer

THIS IS MY THEORY. Did not test it!

shihabmridha commented 4 years ago

@devKshitijJain, I like the idea of converting svg to base64 and replace the link by the base64 code and then save the page.

devKshitijJain commented 4 years ago

@shihabmridha there are still some pieces left to figure out because data URL is also an SVG image. We need to convert it to PNG and then use that.

devKshitijJain commented 4 years ago

@shihabmridha that's true when we add await in any function we need to make that function async

devKshitijJain commented 4 years ago

So I tried it and even after converting it to image, same problem persists. Need some help to fix this.

devKshitijJain commented 4 years ago

@the-redback @shihabmridha after brainstorming for like 5 days, working for like 10 crazy hours last night and trying almost hundreds of solutions, finally fixed the issue :)

devKshitijJain commented 4 years ago

Fixed