openzim / openedx

Open edX (to zim) scraper
GNU General Public License v3.0
8 stars 7 forks source link

Missing /login_ajax endpoint on instance #170

Closed lexaSvarshik closed 1 year ago

lexaSvarshik commented 1 year ago

It should be patched, how it mentioned in #56, but I'm still getting the same output:
openedx2zim --course-url="https://apps.openedu.ru/learning/course/course-v1:urfu+PYAP+fall_2022/home" --publisher="edx201" --email="MY_EMAIL" --password "BEST_PASSWORD" --name="eba" --tmp-dir="output" --output="output" --debug --keep --format="mp4"

[openedx2zim::2023-01-01 20:13:40,806] INFO:Starting openedx2zim 1.0.1 with:
  Course URL: https://apps.openedu.ru/learning/course/course-v1:urfu+PYAP+fall_2022/home
  Email ID: MY_EMAIL
[openedx2zim::2023-01-01 20:13:40,806] DEBUG:Checking for missing binaries
[openedx2zim::2023-01-01 20:13:40,948] INFO:Testing openedx instance credentials ...
[openedx2zim::2023-01-01 20:13:46,776] ERROR:FAILED. An error occurred: Expecting value: line 7 column 1 (char 6)
[openedx2zim::2023-01-01 20:13:46,777] ERROR:Expecting value: line 7 column 1 (char 6)
Traceback (most recent call last):
  File "/home/svarka/.pyenv/versions/3.8.16/lib/python3.8/site-packages/openedx2zim/entrypoint.py", line 213, in main
    scraper.run()
  File "/home/svarka/.pyenv/versions/3.8.16/lib/python3.8/site-packages/openedx2zim/scraper.py", line 825, in run
    self.instance_connection.establish_connection()
  File "/home/svarka/.pyenv/versions/3.8.16/lib/python3.8/site-packages/openedx2zim/instance_connection.py", line 67, in establish_connection
    self.instance_connection = self.get_api_json(
  File "/home/svarka/.pyenv/versions/3.8.16/lib/python3.8/site-packages/openedx2zim/instance_connection.py", line 85, in get_api_json
    return json.loads(
  File "/home/svarka/.pyenv/versions/3.8.16/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/home/svarka/.pyenv/versions/3.8.16/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/svarka/.pyenv/versions/3.8.16/lib/python3.8/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 7 column 1 (char 6)
rgaudin commented 1 year ago

@lexaSvarshik thank you for your report.

The scraper expects the /login_ajax to to present and working to log into the openedX instance.

Using your URL, https://apps.openedu.ru/login_ajax redirects to https://courses.openedu.ru (the homepage). It's an HTML webpage, and not the expected JSON payload.

Tried https://courses.openedu.ru/login_ajax (after all, browser access does uses https://courses.openedu.ru/login_refresh) but it redirects to https://openedu.ru/login_ajax which is a 404.

Maybe the openedX version there is not supported or maybe some features used by the scraper have been disabled/removed.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

benoit74 commented 1 year ago

Looking into this issue, the problem is that an external provider (https://sso.openedu.ru) is configured on this instance (and this external identity provider pages have been tweaked to "look like" the root openEdx website but this a separate tool, looks like it is a Keycloak instance).

I let @rgaudin confirm, but I think we should rename this issue "Add support for openedx instance with external identity provider / SSO configured". Probably the minimal first step would be to find a way to detect it and display a nice error in the logs.

rgaudin commented 1 year ago

I agree

benoit74 commented 1 year ago

@lexaSvarshik I'm closing this as it is not a really bug, enhancement will be done with https://github.com/openzim/openedx/issues/172