yannick-cw / notion-ocr

Adding OCR support to Notion
Other
140 stars 4 forks source link

Can't seem to find marked images in my notion pages #8

Closed xavbart closed 4 years ago

xavbart commented 4 years ago

As requested. I have images marked accordingly : see an example of page here https://www.notion.so/xavbart/Image-recog-test-32473ad27380455582c08b62f7009e81 and, running verbose, I can see the notion API does find my notion account and returns the large JSON covering my account details. But the query for the add_ocr tag is returning nothing.

Request Url: https://www.notion.so/api/v3/searchBlocks
Request Body: {
    "query": "add_ocr",
    "id": "06c5d9d3-7fc0-4c23-91b2-fc4ab98308b0",
    "limit": 1000,
    "table": "space"
}
Response Status Code: Status {statusCode = 200, statusMessage = "OK"}
Response Body: {"results":[],"recordMap":{}}
Found 0 images to process
yannick-cw commented 4 years ago

Okay this is gonna be a bit more complicated. For my setup I can take the space id to search and it finds all my content, it seems you have a different setup, that could be why it does not find anything. If we want to find out whats going on we'd need to try:

  1. doing the search in the browser and checking which id is used in the search request json body developer tools -> network
  2. checking in the response of the loadUserContent request from this verbose output, in which field this id is written

I guess your search in the browser does not use the id from the request body you posted above / or maybe the "table": "space" is different...

xavbart commented 4 years ago

Indeed it seems to call another ID than the one used by the notion-ocr exec {query: "add_ocr", table: "space", id: "5eef59c4-4417-4cae-a1a4-2be5cafd8a24", limit: 20} Any part of the response to https://www.notion.so/api/v3/loadUserContent that would be relevant ? I have actually 4 different notion spaces altogether so could this be the issue ? (it searches inside one only, and the wrong one ?)

yannick-cw commented 4 years ago

Perfect, it is the spaces, I am going to push a fix for that and than we can try

xavbart commented 4 years ago

Would you want a sample of the JSON describing my multi-space situation? (although I am not sure I can make sense of the structure as is)

yannick-cw commented 4 years ago

No I think I found the problem, created multi space locally and tested it, maybe if my solution does not work :)

yannick-cw commented 4 years ago

@xavbart can you try with 0.1.4?

xavbart commented 4 years ago

Just did (saw your update). It does find things, but failed on retrieving a page that seems to annoy it. notion-ocr: JSONError "Error in $.recordMap.block['53871674-0724-461f-990d-74e555371f32'].value.properties.title[1][1]: parsing Text failed, expected String, but encountered Array For info, it seems to point to that page https://www.notion.so/We-the-Doers-Fiverr-s-Entrepreneurial-Populism-and-a-3-Days-Workweek-THE-ENTREPRECARIAT-734d707063d04189a58c6a673cb3670d (whose title has maybe an issue in parsing because of the pipe | ?) So that stops your execution. I may change it to allow further execution unless you want me to see it as a test for 0.1.5. UPDATE: ignore above. It seems it fails on YOUR page as I saved it in Notion (duh) and it has some issues. I'll try and put it in the trash to see if it does skip it. UPDATE 2: Ok that was it. I had saved your description page in Notion as you can see here https://www.notion.so/Search-In-Your-Notion-Images-to-for-Notion-53c74ca643a344969517624fa56eb244 and it would not parse it properly. Now it does parse the images of all pages if I put above page in the bin. But it doesn't if this page is visible. I'll leave it available for you to copy it across if you need to test?

yannick-cw commented 4 years ago

I’d like to debug the problem, can you send me the response when you run it verbose? Especially the value.properties.title part would be interesting ;)

XavBart notifications@github.com schrieb am Fr. 29. Nov. 2019 um 18:09:

Just did (saw your update). It does find things, but failed on retrieving a page that seems to annoy it. notion-ocr: JSONError "Error in $.recordMap.block['53871674-0724-461f-990d-74e555371f32'].value.properties.title[1][1]: parsing Text failed, expected String, but encountered Array For info, it seems to point to that page https://www.notion.so/We-the-Doers-Fiverr-s-Entrepreneurial-Populism-and-a-3-Days-Workweek-THE-ENTREPRECARIAT-734d707063d04189a58c6a673cb3670d (whose title has maybe an issue in parsing because of the pipe | ?) So that stops your execution. I may change it to allow further execution unless you want me to see it as a test for 0.1.5.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/yannick-cw/notion-ocr/issues/8?email_source=notifications&email_token=AAS6365UJJGHQQ7QESVCSFLQWFED5A5CNFSM4JR63762YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFPJM7I#issuecomment-559847037, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAS6365NXZM77RQPA6C4RLDQWFED5ANCNFSM4JR6376Q .

xavbart commented 4 years ago

This is the one I quoted at beginning of my feedback notion-ocr: JSONError "Error in $.recordMap.block['53871674-0724-461f-990d-74e555371f32'].value.properties.title[1][1]: parsing Text failed, expected String, but encountered Array Oh sorry you meant the JSON blob. Sure. I isolated the part of that page in response which has an "array in title" : { "role" : "editor", "value" : { "id" : "53871674-0724-461f-990d-74e555371f32", "version" : 1, "type" : "text", "properties" : { "title" : [ [ "In the line " ], [ "right", [ [ "b" ] ] ], [ " below any image in notion write " ], [ "add_ocr", [ [ "c" ] ] ], [ ", the next time the tool runs, it replaces that with the text scanned from the image." ] ] }, "created_by" : "4ca61bfb-f1b8-409e-ba27-0fedb84839d6", "created_time" : 1574508285559, "last_edited_by" : "4ca61bfb-f1b8-409e-ba27-0fedb84839d6", "last_edited_time" : 1574508285559, "parent_id" : "38ad2b8f-ac27-46be-9e45-b139163da860", "parent_table" : "block", "alive" : true, "ignore_block_count" : true, "created_by_table" : "notion_user", "created_by_id" : "4ca61bfb-f1b8-409e-ba27-0fedb84839d6", "last_edited_by_table" : "notion_user", "last_edited_by_id" : "4ca61bfb-f1b8-409e-ba27-0fedb84839d6" } } Tell me if you need more, but as you have the page itself, you might find out why it was considering the block title as an array (hint: it seems because of formatting, they -Notion- put inside array elements the differently formatted text parts and attach formatting inside each element, or so it seems) (so yes, this sort of issue might happen more often on any potential content from users.)

yannick-cw commented 4 years ago

I think with 0.1.5 this is fixed now

xavbart commented 4 years ago

brew upgrade notion-ocris not seeing latest 0.1.5. Am I doing something wrong? Warning: yannick-cw/tap/notion-ocr 0.1.4 already installed Likewise for a re-install: Warning: yannick-cw/tap/notion-ocr 0.1.4 is already installed and up-to-date To reinstall 0.1.4, run 'brew reinstall notion-ocr'

yannick-cw commented 4 years ago

Ah sorry, I did not update the brew package yet, my mac is at work, will do on moday

xavbart commented 4 years ago

Ah sorry, I did not update the brew package yet, my mac is at work, will do on moday

No prob, I'll use the other install methods.

yannick-cw commented 4 years ago

@xavbart I also released brew again!

xavbart commented 4 years ago

Yep, now this seems fixed. Guess the whole issue can be closed. Thanks.