raffaem / cs-dlp

Script for downloading Coursera.org videos and naming them.
GNU Lesser General Public License v3.0
313 stars 49 forks source link

Unsupported typenames #3

Open abachdoc opened 1 year ago

abachdoc commented 1 year ago

Describe the bug Jupyter notebooks not downloading and other assets not downloading. There are unsupported typenames: -Unsupported typename "ungradedWidget" in lecture "reading-getting-started-with-the-model-asset-exchange-and-the-data-asset" -Unsupported typename "ungradedLti" in lecture "hands-on-lab-getting-started-with-jupyter-notebooks" -Unsupported typename "staffGraded" in lecture "final-exam

To Reproduce cs-dlp -ca YOURCAUTH open-source-tools-for-data-science (--download-notebooks included in conf file)

Expected behavior Jupyter notebooks and other assets download.

fsixto commented 1 year ago

Describe the bug Jupyter notebooks not downloading and other assets not downloading. There are unsupported typenames: -Unsupported typename "ungradedWidget" in lecture "reading-getting-started-with-the-model-asset-exchange-and-the-data-asset" -Unsupported typename "ungradedLti" in lecture "hands-on-lab-getting-started-with-jupyter-notebooks" -Unsupported typename "staffGraded" in lecture "final-exam

To Reproduce cs-dlp -ca YOURCAUTH open-source-tools-for-data-science (--download-notebooks included in conf file)

Expected behavior Jupyter notebooks and other assets download.

Good description of the issue, I just experimented the same thing with the jupyter notebooks!

arnaud-feldmann commented 1 year ago

Same for me. image

theotherp commented 1 year ago

I got this far:

    def extract_links_from_lab(self, lecture_id, class_id):
        try:
            ONDEMAND_LAB = "https://www.coursera.org/api/onDemandLearnerWorkspaces.v1/?action=launch&id={user_id}~{class_id}~{lab_id}"

            headers = self._auth_headers_with_json()
            reply = get_page(
                self._session,
                ONDEMAND_LAB,
                json=True,
                post=True,
                data="{}",
                user_id=self._user_id,
                class_id=class_id,
                lab_id=lecture_id,
                headers=headers
            )
            launch_url = reply["launchUrl"]
            workspace_id = reply["workspaceId"]
            notebook_name = re.match(".*notebooks%2F([^&]+).*", launch_url).group(1)
            download_url = f"https://{workspace_id}.labs.coursera.org/nbconvert/notebook/{notebook_name}?download=true"

            # At this point downloading the download_url will download an HTML page because a token is missing
            # I've found the cookie COURSERA_SUBMISSION_TOKEN but don't know if it's important and how to get it

            return {"ipynb": [(download_url, '')]}
        except requests.exceptions.HTTPError as exception:
            logging.error('Could not download notebook %s: %s',
                          lecture_id, exception)
            if is_debug_run():
                logging.exception(
                    'Could not download notebook %s: %s', lecture_id, exception)
            return None

But I'm stuck at the download URL. Opening it in the browser works but downloading in cs-dlp results in an HTML page that says "Generating token".

zeushera140 commented 9 months ago

any solutions? I can download the notebooks manually, but cs-dlp no longer works.