soraxas / echo360

Commandline tool for automated downloads of echo360 videos hosted by university
https://cs.tinyiu.com/echo360
MIT License
281 stars 53 forks source link

Add headless support for echo360 cloud #21

Open Soneji opened 4 years ago

soraxas commented 4 years ago

Hi @overclockedllama thank you for your prompt PR. I've tried it but it didn't work for me, as it is a specific setup.

I wasn't sure if you noticed, but the reason --headless was not used in echo360cloud is because echo360cloud tends to use SSO to checking credentials. In contrast to a self-host echo360 where it has predictable username and password field (hence the --headless was present there), SSO is fully dependent on the individual system that it uses to authenticate.

For example, my university uses office365 to manage its login system. Therefore, after the

webdriver.find_element_by_xpath('//*[@id="email"]').send_keys(username)
webdriver.find_element_by_xpath('//*[@id="submitBtn"]').click()

it redirects the page to the office365 login system and subsequently causes the

webdriver.find_element_by_xpath('//*[@id="password"]')

fails as that element wasn't presents.

Soneji commented 4 years ago

Oh right, maybe we could add a new flag called headless or headless-cloud or something like that?

soraxas commented 4 years ago

But the proposed implementation does not work in general, because the PR expects the login page has certain elements (i.e. //*[@id="email"] and '//*[@id="submitBtn"] as you have used), but this is not always true. I think it is dependent on the organisation.

Soneji commented 4 years ago

image I think for the .org.uk one there's a standard login for all academic institutes in the uk. Or at least it looks like it isnt customised for my academic institute

Soneji commented 4 years ago

What does yours look like?

soraxas commented 4 years ago

Hi @overclockedllama I don't believe that is true, where it is not dependent on hostname (like echo360.org[.xx]) but only dependent on the organisation/university.

The echo360[.xx] login screen is always the same (e.g. like the one you shown or the one in readme.md in this repo) but the authenticate method will be different depending on the different institution; i.e., after you entered your email, it checks against an internal database and redirects you to the corresponding login page. E.g., for my system, it redirects me to a page with my institution's authenticate the system.

I found this old youtube video about the echo360 system, and (while the interface is the older version) it showcases the login system in a more explicit way. From the video, you can see that you can log in with username/password, or login via institution which is using institution's SSO to authenticate. It probably is the case where different institutions could choose to either

  1. allow students to use their email to authenticate with echo360's system (which is like setting up a new account in echo, and might create the possibility of the password being differed, as the commentator had said), or
  2. authenticate via the insitution's SSO system. And whether it is case 1 (which is trivial as the page is always the same) or case 2 (which has a different page for different institution) it is dependent on the institution.
Soneji commented 4 years ago

Ok that makes sense. So if I want to setup a headless mode for those academic institutions that use echo360 login, not academic institute login. What would the best way of doing that be? I feel like the only way is to add a new flag like —cloud-login-headless or something like —echo-account

What do you think the best way to add headless support for echo logins is? I don’t think it’s possible to add headless support for other academic institutions

soraxas commented 4 years ago

I guess the most suitable way is to first detect if login can be completed via terminal; and if not, proceed to spawn a browser GUI. This is similar to the approach previously for the non-cloud platform.

Previously, if the module detected that username/password field exists, then it will prompt to ask for username and password. If not, it will prints out the INFO: No need to login message you previously saw in your issue.

See https://github.com/soraxas/echo360/blob/a3a675328bef1b9f84d3f2aa45fbd2b23bcaeecb/echo360/downloader.py#L131 and https://github.com/soraxas/echo360/blob/a3a675328bef1b9f84d3f2aa45fbd2b23bcaeecb/echo360/downloader.py#L106

soraxas commented 4 years ago

Check the demonstration video at this timestamp https://youtu.be/Lv1wtjnCcwI?t=83 of prompting for username/password (note that in the video they are hidden)

It dynamically prompts for asking username and password when necessary (when those field presents) https://github.com/soraxas/echo360/blob/a3a675328bef1b9f84d3f2aa45fbd2b23bcaeecb/echo360/downloader.py#L143

This would makes your PR compatible for all institutions (where browser would only be needed as a fallback).

Soneji commented 4 years ago

I'm not really sure what to do then. All I can say is that to login I need to put the username in the email field, click submit, put the password in the password field, and click submit. If the user does not input username and password, it lets them use the fallback option of logging in by themselves

soraxas commented 4 years ago

Some comments are:

  1. It should fail gracefully. Currently, if the expected fields does not exists, it will throws

    selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="password"]"}
    (Session info: headless chrome=81.0.4044.138)
    (Driver info: chromedriver=2.38.552522 (------------------------),platform=Linux 4.14.180-1-SYSTEM x86_64)

    which would create confusion. Instead, it should fail gracefully with message that indicates if it is due to invalid username, password, or because it needs SSO from the institution. You can detect that by the field existences (see the files I previously suggested which had a similar approach). Then, you can choose to either (i) gracefully exit with error, or (ii) fallback automatically and spawn a GUI browser.

  2. Following from the previous point, the reason for the need to fail gracefully is because this would create incompatibilities to the non-echo360.org URLs with this proposed approach, as it also uses -u and -p.

  3. You might also want to add the --headless mode to firefox driver. And in fact, if the login can be accomplished without SSO you can directly use phantomjs driver, which is the default driver for non-echo360.org URLs. The reason for using chrome or firefox is only because of the SSO login issues.