sillsdev / docu-notion

Download Notion pages as markdown and image files, preserving hierarchy and enabling workflow properties. Works with Docusaurus.
MIT License
157 stars 30 forks source link

More resilient image names #82

Closed hatton closed 7 months ago

hatton commented 1 year ago

Currently, we extract an image name from the S3 URL that Notion uses. This name "matters" if you're doing image localization.

My concern is, what if Notion changes their URL scheme in a way that either a) requires a new regex to extract it (this has happened already once) or b) changes the id entirely?

Instead, we should use the id of the block, which is likely to be at least as stable, if not more-so, and doesn't require a regext to extract. here's a typical block with unnecessary stuff removed:

{
    "object": "block",
    "id": "690583ea-a11a-479a-b66d-b566eb1a52aa",   <--- we should use this for the image name

    "type": "image",
    "image": {
      "type": "file",
      "file": {
        "url": "https://s3.us-west-2.amazonaws.com/secure.notion-static.com/7c53cd0c-f6e6-43be-b1d0-e39124615294/Untitled.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=AKIAT73L2G45EIPT3X45%2F20230926%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20230926T154205Z&X-Amz-Expires=3600&X-Amz-Signature=82c0a4710cd112e0b3ff1b1fc60562747ad9b541891141ee3be0f97ac12f67bf&X-Amz-SignedHeaders=host&x-id=GetObject",
        "expiry_time": "2023-09-26T16:42:05.665Z"
      }
    }
  },

This is an easy fix, however it would break existing users, so we need a way to opt in or opt out of this change.

Nateowami commented 9 months ago

This is currently a major issue for the Scripture Forge help site. Unfortunately I don't have time now to address this, but I wanted to register the fact that we will benefit once this is addressed.

andrew-polk commented 9 months ago

@Nateowami Bloom and Paratext are both using the system as is with two different strategies for localization. If it feels like this is holding you back, it might be worth a short conversation to understand the hang-up and how these two sites are working with the tension currently.

Nateowami commented 8 months ago

@andrew-polk Originally I planned to have our site be built by running a script that would fetch all documents from Notion, Crowdin, and elsewhere, and build the site. None of the content would be checked in (no screenshots either).

However, I realized before too long that this was kind of impractical and would make it difficult to understand what had actually changed. So I created a new repo, and content (docs and images) are checked in.

Having the file names change every time we fetch them means new screenshots are created in the repo every time. Having the markdown change also added some challenges with updating Crowdin, since there were always changes to the files.

I can't think of a good solution that doesn't involve keeping the file names consistent.

andrew-polk commented 8 months ago

Are the file names changing every time you run docu-notion? Or just when you change something with the image?

With our and PTX setup, we only get changes if there are changes in Notion.

If you see a change with every pull, we will need to figure out what is causing that.

Nateowami commented 8 months ago

Yes, they change every time, even a few minutes later. I'm running it with npx docu-notion -n $SF_HELP_NOTION_TOKEN -r $SF_HELP_NOTION_ROOT_PAGE_ID

andrew-polk commented 8 months ago

Hm. Looks like npx is running docu-notion 0.11.0. I'll look into why, but in the meantime, try using 0.15.0.

Nateowami commented 8 months ago

I thought npx downloads the latest version of docu-notion and runs it. Does npx not work how I think? How do I tell it to use a particular version?

Nateowami commented 8 months ago

Oh, it looks like I should be using @sillsdev/docu-notion instead of docu-notion

andrew-polk commented 8 months ago

Ah. That would do it. Also, apparently npx @sillsdev/docu-notion@latest guarantees no cache issues, etc.

andrew-polk commented 7 months ago

Default image file name template is now {page-slug}.{notion-block-id}. See https://github.com/sillsdev/docu-notion/releases/tag/v0.16.0.