Closed the-redback closed 4 years ago
@the-redback it's a known issue and images are not being rendered by MHTML. For now you can inspect element and you will see image url in it that you can open manually. Meanwhile we will try to fix it.
Idea to fix this issue is to get SVG images, convert them to data URL and then use img instead of svg.
var svgElement = document.getElementById('svgId');
// Create your own image
var img = document.createElement('img');
// Serialize the svg to string
var svgString = new XMLSerializer().serializeToString(svgElement);
// Remove any characters outside the Latin1 range
var decoded = unescape(encodeURIComponent(svgString));
// Now we can use btoa to convert the svg to base64
var base64 = btoa(decoded);
var imgSource = `data:image/svg+xml;base64,${base64}\`;
img.setAttribute('src', imgSource);
at the end just replace that svg with img.
I want to point another issue with pdf saves. While unfolding the slide, most of the images saved as blank in pdf. After investigation, I found that pages are considered done before the new images load. So, it stays blank. Although I workaround adding a sleep time before saving the pdf, it is a hacky solution that doesn't guarantee that images are loaded and also waits for every page without considering if there is any slide or not. I hope you guys can come up with a better solution.
Thanks again. <3
As images are lazy loading in nature. A simple solution would be to click and then smooth scroll to element, which will ensure the images are loaded. But in that case it will mess up with the code blocks as monaco editor will load and that will create issue with rendering of code blocks. I have a script that blocks it but need to check. Does it work fine with MHTML?
Last time I checked, It was also blank in MHTML. But, I am not sure if it due to lazy loading or svg problem.
BTW, I have no knowledge of JS or nodejs. Never tried hello world printing either. :D I want to know, is there any way to add waiting time in this block? https://github.com/shihabmridha/educative.io-downloader/blob/b048bf19e75328896fb5221bc737929e1ab629a5/src/download.ts#L150-L153
I have used few solutions from online which doesn't work. Also, is there any way to waitfornavigation after element.click()
?
@shihabmridha for reference we need help from here - https://stackoverflow.com/a/46639473
@the-redback you can use this after element.click();
//Scrolling to the element clicked
element.scrollIntoView({behavior: "smooth"});
//Wait for 1000ms
await new Promise(function(resolve) {setTimeout(resolve, 1000);});
@the-redback, FYI: Any function you pass to page.evaluate(cb, arg)
(for example our pageEvaluation()
) gets executed in browser not in our script/node process. So, if you write console.log
to see whats going on inside that block you will not see anything in the console.
What I am trying to say is, any code you write in there you need to make sure it runs in chrome. I don't know client-side JavaScript that much so I have no idea how to wait for network call to resolve. We can not use any puppeteer feature there.
@devKshitijJain your code to wait for 1000ms might not work in this case because, you can only await
a function inside an async
function. In our case pageEvaluation
is not an async function.
So, instead of this: https://github.com/shihabmridha/educative.io-downloader/blob/b048bf19e75328896fb5221bc737929e1ab629a5/src/download.ts#L84
you @the-redback can try this: _const languages = await page.evaluate(`pageEvaluation`, { SAVE_AS, SAVE_LESSONAS });
and on line 147 add async
keyword infront of the function.
async function pageEvaluation({ SAVE_AS, SAVE_LESSON_AS }) { ... }
THIS IS MY THEORY. Did not test it!
@devKshitijJain, I like the idea of converting svg to base64 and replace the link by the base64 code and then save the page.
@shihabmridha there are still some pieces left to figure out because data URL is also an SVG image. We need to convert it to PNG and then use that.
@shihabmridha that's true when we add await
in any function we need to make that function async
So I tried it and even after converting it to image, same problem persists. Need some help to fix this.
@the-redback @shihabmridha after brainstorming for like 5 days, working for like 10 crazy hours last night and trying almost hundreds of solutions, finally fixed the issue :)
Fixed
Thanks for this useful tool.
As the title says, while downloading a course, the mhtml fails to show an image (possibly svg+xml), but the pdf works fine.
I ran in headless=false mode, the image load successfully. But, after downloading, it was blank in mhtml file.
This is an example page, https://www.educative.io/courses/advanced-kubernetes-techniques/3YzvvGX3RxA
Thanks.