automating screenshots for static preview

pdpinch commented 9 years ago

LORE users would like to be able to visually preview problems. We need a way to automate capture of screenshots.

[x] find a library for taking screenshots that can be automated from javascript or python
[ ] simple function that, given a URL, creates a screenshot
[x] first should work with an edX dev stack
[ ] later will need to work with lms.mitx.mit.edu (requires certs)
[ ] later will need to work with edx.org and edge.edx.org (requires credentials)
[ ] address storage later

ShawnMilo commented 9 years ago

The Selenium package for Python is easy to script and has the ability to take screenshots and save them as .png files, or base64 strings, which could be stored in a database.

Since it controls a browser, it should be able to handle certificates and can definitely work with pages which require JavaScript to work properly.

amir-qayyum-khan commented 9 years ago

Thanks @ShawnMilo I was looking into different approaches

Found a example http://stackoverflow.com/questions/1197172/how-can-i-take-a-screenshot-image-of-a-website-using-python

Currently I am looking into how to import certificate into Selenium Driver .

amir-qayyum-khan commented 9 years ago

@pdpinch I have few questions

At what point screenshot will be trigger i.e -- User will click a button to get screenshot -- Lore will get screenshots from background without users trigger
Need to create a program in Lore django app, that will take url as input and give screenshot
What is the output format, -- Do i need to save screenshot some where and add entry in db or in logs? -- Or app will release a stream, browser will download image?
We need a way to automate capture of screenshots. what is mean by this line , it seems application during execution can take screenshot any time for example if we have a crash or bug then app will take screenshot and save it as log?

Can you explain me use case, it will give me better understanding of this feature Thanks

pdpinch commented 9 years ago

@Ferdi here's my answers to these questions. Please correct me if I'm wrong.

updated responses below, in a later comment

At what point screenshot will be trigger



> What is the output format,

~~I think using django-storages would be the most scalable and flexible, but I'm open for discussion. @carsongee is in the process of adding/configuring it in LORE.~~

> Need to create a program in Lore django app, that will take url as input and give screenshot

~~That's an OK place to start, but Ferdi may have something else in mind.~~

> `We need a way to automate capture of screenshots`. What is mean by this line? 

~~The use case is that we have a collection of learning object XML in our database. Users want to be able to preview the rendered XML on edx, because it's easier for them to visualize the problem as rendered than as XML. In most cases, we probably want to automate the fetching of screenshots when the XML is imported into LORE~~

~~This issue would only deliver on part of the larger story: fetching one screenshot based on a URL.~~

carsongee commented 9 years ago

I still haven't heard back from Nimisha on using an actual API to pull the real problem out. I'll ping her again. That said, this still may make sense to do as screenshots even if we use the API to construct them. If so, I'd really rather not have to have a full headed X11 Web browser be on our server as it will kill memory, be hard to install, and create potential security problems. While implementing this, we should maybe look at using phantomjs, which is headless but still memory heavy, and is a supported browser for selenium recently. When writing unit tests for this, I'd really like to see us have some memory checking if at all possible so we can know how much of an overhead this will add.

On Thu, Jun 11, 2015 at 8:53 AM, Peter Pinch notifications@github.com wrote:

@Ferdi https://github.com/Ferdi here's my answers to these questions. Please correct me if I'm wrong.

At what point screenshot will be trigger

We need to be able to support both options -- on demand or batch. Batch is probably more likely though.

What is the output format,

I think using django-storages would be the most scalable and flexible, but I'm open for discussion. @carsongee https://github.com/carsongee is in the process of adding/configuring it in LORE.

Need to create a program in Lore django app, that will take url as input and give screenshot

That's an OK place to start, but Ferdi may have something else in mind.

We need a way to automate capture of screenshots. What is mean by this line?

The use case is that we have a collection of learning object XML in our database. Users want to be able to preview the rendered XML on edx, because it's easier for them to visualize the problem as rendered than as XML. In most cases, we probably want to automate the fetching of screenshots when the XML is imported into LORE

This issue would only deliver on part of the larger story: fetching one screenshot based on a URL.

— Reply to this email directly or view it on GitHub https://github.com/mitodl/lore/issues/133#issuecomment-111122659.

ShawnMilo commented 9 years ago

Is xvfb an option for this? I'm sure you know about it already; just bringing it up because it wasn't mentioned previously in the thread.

Based on this comment, it is slower than something like phantomjs, with the trade-off of using a real browser, so it will behave more like the real world and you have the option to run the same code without xvfb and see it while debugging.

That StackOverflow question was referring to testing specifically, but I think the observations are relevant to our need.

carsongee commented 9 years ago

XVFB is just a way to run X11 on a headless server. We use chrome and firefox with xvfb for constant monitoring of our edx environment already. I know the performance profile on that is big and super slow, and is why I suggested not using X11(xvfb) and a "real" browser if we can avoid it. We aren't using anything but the rendering engine since we are taking screen shots without interaction, so arguably there may be something like zombiejs that would be even faster and still do everything we need, though I haven't looked extensively out there

carsongee commented 9 years ago

Here is the API I mentioned: https://github.com/edx/edx-platform/pull/8240 It is currently behind this feature flag: ENABLE_RENDER_XBLOCK_API Here is the URL: https://github.com/edx/edx-platform/blob/master/lms/urls.py#L464

pdpinch commented 9 years ago

@amir-qayyum-khan some updates on this:

At what point screenshot will be trigger

Priority will be to handle a batch process. In the LORE scenario, we will want to create screenshots right after a course is imported.

What is the output format,

PNGs saved to S3, I think.

Need to create a program in Lore django app, that will take url as input and give screenshot

This shouldn't have any dependency on LORE.

We need a way to automate capture of screenshots. What is mean by this line?

The use case is that we have a collection of learning object XML in our database. Users want to be able to preview the rendered XML on edx, because it's easier for them to visualize the problem as rendered than as XML. In most cases, we probably want to automate the fetching of screenshots when the XML is imported into LORE

This issue would only deliver on part of the larger story: fetching one screenshot based on a URL.

pdpinch commented 9 years ago

[ ] find out how the RENDER XBLOCK API is protected & deal with authentication (probably OAuth)
[ ] Add a field to the learningresource model called screenshot; it should be a django file, which will handle the storage location. Should be a separate PR.
[ ] Need to determine what types of learningresources get screenshots: problem, video. Maybe that's it. Or a negative set, i.e. we don't want course, chapter, sequential, vertical. @Ferdi ?
[ ] Need configuration variable to specify the location of the edX preview server
[ ] Need to find out how to specify the XBlock we want to take a screenshot. Do we have all the necessary data in
[ ] Need to handle a 404 gracefully

Concerns:

[ ] Running a browser on the server introduces security risks, especially when a user controls the URLs that it requests
[ ] The "edX preview server" will contain sensitive course content (answers). We need to think about how to prevent users from getting screenshots from materials they don't own.

amir-qayyum-khan commented 9 years ago

Mvp is ready https://github.com/amir-qayyum-khan/screenshot_xblock/tree/master. It use api you mentioned @carsongee https://github.com/mitodl/lore/issues/133#issuecomment-111592917

cc @pdpinch

amir-qayyum-khan commented 9 years ago

@carsongee deployed screenshot api https://secret-dawn-6402.herokuapp.com/ need a sandbox to test

cc @pdpinch

mitodl / lore

automating screenshots for static preview #133