sillsdev / docu-notion

Download Notion pages as markdown and image files, preserving hierarchy and enabling workflow properties. Works with Docusaurus.
MIT License
159 stars 30 forks source link

Feature request: use image hash for name #76

Closed dionjwa closed 7 months ago

dionjwa commented 1 year ago

Every time you run the conversion process, images get new names.

Even if they are the same image, verified by hashing the content.

If the name was the image hash, then unnecessary deployments would not be triggered (my update is automated).

hatton commented 1 year ago

I don't have time to experiment at the moment, @dionjwa but this is surprising to me. Images can get into Notion in various ways. Maybe our code isn't handling one of the ways correctly. You can look here: https://github.com/sillsdev/docu-notion/blob/main/src/images.ts See parseImageBlock()

Edit: The above gets the URL, but makeImagePersistencePlan() comes up with the file name.

andrew-polk commented 1 year ago

Hm. There is a regression due to Notion changing the urls for images. Previously they were something like https://s3.us-west-2.amazonaws.com/secure.notion-static.com/d1058f46-4d2f-4292-8388-4ad393383439/Untitled.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=AKIAT73L2G45EIPT3X45%2F20220516%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20220516T233630Z&X-Amz-Expires=3600&X-Amz-Signature=f215704094fcc884d37073b0b108cf6d1c9da9b7d57a898da38bc30c30b4c4b5&X-Amz-SignedHeaders=host&x-id=GetObject and now they are something like https://prod-files-secure.s3.us-west-2.amazonaws.com/d9a2b712-cf69-4bd6-9d65-87a4ceeacca2/d1bcdc8c-b065-4e40-9a11-392aabeb220e/Untitled.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=AKIAT73L2G45EIPT3X45%2F20230915%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20230915T161258Z&X-Amz-Expires=3600&X-Amz-Signature=28fca48e65fba86d539c3c4b7676fce1fa0857aa194f7b33dd4a468ecca6ab24&X-Amz-SignedHeaders=host&x-id=GetObject

Notably, the host changed and there is another hash in there. Our strategy was to pull out the image block hash and then derive a filename from that. Looks our strategy was brittle. I will make a quick fix to handle this new url format, but we probably need to figure out something more robust going forward.

github-actions[bot] commented 1 year ago

:tada: This issue has been resolved in version 0.13.4 :tada:

The release is available on:

Your semantic-release bot :package::rocket:

github-actions[bot] commented 1 year ago

:tada: This issue has been resolved in version 0.14.0-alpha.4 :tada:

The release is available on:

Your semantic-release bot :package::rocket:

dionjwa commented 1 year ago

Awesome!!!!!

hatton commented 1 year ago

Should we instead use the id of the parent block, so that we keep a consistent image name even if Notion changes their URLs?

{
  "object": "block",
  "id": "caae873b-543b-4de2-86b8-e103397e7a3c",  <-------------- THIS
  "parent": {
    "type": "page_id",
    "page_id": "2f3ad4e2-7d98-47b8-858f-5c4436e9cc91"
  },
  "created_time": "2022-08-17T20:41:00.000Z",
  "last_edited_time": "2022-12-12T16:33:00.000Z",
  "created_by": {
    "object": "user",
    "id": "11fb7f16-0560-4aee-ab88-ed75a850cfc4"
  },
  "last_edited_by": {
    "object": "user",
    "id": "808b4af1-8eb8-44ca-8575-799359e982e7"
  },
  "has_children": false,
  "archived": false,
  "type": "image",
  "image": {
    "caption": [
      {
        "type": "text",
        "text": {
          "content": "Recording a Talking Book alt text box",
          "link": null
        },
        "plain_text": "Recording a Talking Book alt text box"
      }
    ],
    "type": "file",
    "file": {
      "url": "./1760295598.png",
      "expiry_time": "2023-09-23T23:02:24.599Z"
    }
  }
}
github-actions[bot] commented 1 year ago

:tada: This issue has been resolved in version 0.14.0 :tada:

The release is available on:

Your semantic-release bot :package::rocket:

dionjwa commented 1 year ago

I wouldn't, because you can change the image underneath the parent block id.

I want to be able to cache this downstream, so I would like the id to only change if the actual image content changes.

Images can also be "synced blocks" so efficiently re-using images would be nice.

andrew-polk commented 7 months ago

You can now opt in to using the content hash. See https://github.com/sillsdev/docu-notion/releases/tag/v0.16.0