Open bhchiang opened 3 years ago
@bryanhpchiang it looks like the page you linked isn't publicly shared... unless you recently un-shared it, that would explain why loconotion cannot load the content.
There's definitely an issue here though, since shared pages have the new Notion URL format of https://example.notion.site/Page-1F29BC48EA1A029FC481B but sub-pages still have the old format https://notion.so/example/Page-1F29BC48EA1A029FC481B which is a redirect page.
For me, loconotion correctly loads the primary public Notion page URL, but times out on any subpages. I think the logic at lines 582-584 of loconotion/notionparser.py
needs to be updated to rewrite page URLs from the old format to the new one before attempting to fetch them.
@tomreitz merged a pull request from @bryanhpchiang earlier today which should address this, want to pull it and check it's all good?
@leoncvlt thanks for the quick response (and an awesome project!). Subpages still not working for me, see this public page which converts fine, but the subpages in the table time out, per the logs below
[21:47:02] INFO Initialising parser with configuration file
[21:47:02] INFO Setting output path to 'dist/wiwebsites.com'
[21:47:02] INFO Initialising chromedriver at /usr/bin/chromedriver
[21:47:03] INFO Parsing page 'https://tomreitz.notion.site/Wisconsin-Websites-ecdb3dc4cd1e40f280b7512a23ca2006'
[21:47:17] INFO Downloading 'https://www.notion.so/print.b31f28aa.css'
[21:47:17] INFO Downloading 'https://www.notion.so/app-7d82edb35207a8a8b776.css'
[21:47:18] INFO Downloading 'https://www.notion.so/lyon-text-regular-3be84b20b1d9ff1e3456b0a220ae449b.woff'
[21:47:18] INFO Downloading 'https://www.notion.so/lyon-text-regular-italic-437d32a42fc5b8268bb4a1e0cc8b363f.woff'
[21:47:18] INFO Downloading 'https://www.notion.so/lyon-text-semibold-acb7f110189034ff6a1afa4b730be0ed.woff'
[21:47:18] INFO Downloading 'https://www.notion.so/lyon-text-semibold-italic-1f81a2f93060f05edd7f078ac91f25e6.woff'
[21:47:18] INFO Downloading 'https://www.notion.so/iawriter-mono-regular-4b73d071988a4f1cd2283524716ad970.woff'
[21:47:18] INFO Downloading 'https://www.notion.so/iawriter-mono-italic-d5d3224c1377168e261efc6aa0ce89c6.woff'
[21:47:18] INFO Downloading 'https://www.notion.so/iawriter-mono-bold-eb96a5e539892d26cf8b0cb2367e3580.woff'
[21:47:18] INFO Downloading 'https://www.notion.so/iawriter-mono-bold-italic-743b231fa82483406c79a00fa1f12fe8.woff'
[21:47:18] INFO Downloading 'https://www.notion.so/inter-ui-regular-3ae6a7d3890c33d857fc00bd2e4c4820.woff'
[21:47:18] INFO Downloading 'https://www.notion.so/inter-ui-medium-95b8a98959d1af9ab432d7ffe295ef94.woff'
[21:47:18] INFO Downloading 'https://www.notion.so/inter-ui-semibold-19b57197b819695d334b9961ee41910e.woff'
[21:47:18] INFO Downloading 'https://www.notion.so/inter-ui-bold-001893789f7f342b520f29ac8af7d6ca.woff'
[21:47:18] INFO Downloading 'https://www.notion.so/permanent-marker-a6d62939e7c920a184ddddcf4149e62c.woff'
[21:47:18] INFO Downloading 'https://www.notion.so/katex/katex.88defe76.min.css'
[21:47:18] INFO Downloading 'https://www.notion.so/katex/fonts/KaTeX_AMS-Regular.342a61e0.ttf'
[21:47:18] INFO Downloading 'https://www.notion.so/katex/fonts/KaTeX_Caligraphic-Bold.b27e354b.ttf'
[21:47:18] INFO Downloading 'https://www.notion.so/katex/fonts/KaTeX_Caligraphic-Regular.bd18bae2.ttf'
[21:47:19] INFO Downloading 'https://www.notion.so/katex/fonts/KaTeX_Fraktur-Bold.359e1e97.ttf'
[21:47:19] INFO Downloading 'https://www.notion.so/katex/fonts/KaTeX_Fraktur-Regular.6b53a2db.ttf'
[21:47:19] INFO Downloading 'https://www.notion.so/katex/fonts/KaTeX_Main-Bold.ed829b5f.ttf'
[21:47:19] INFO Downloading 'https://www.notion.so/katex/fonts/KaTeX_Main-BoldItalic.ca23ba4b.ttf'
[21:47:19] INFO Downloading 'https://www.notion.so/katex/fonts/KaTeX_Main-Italic.14ff9c98.ttf'
[21:47:19] INFO Downloading 'https://www.notion.so/katex/fonts/KaTeX_Main-Regular.c89c6436.ttf'
[21:47:19] INFO Downloading 'https://www.notion.so/katex/fonts/KaTeX_Math-BoldItalic.7b481bb8.ttf'
[21:47:19] INFO Downloading 'https://www.notion.so/katex/fonts/KaTeX_Math-Italic.f677173e.ttf'
[21:47:19] INFO Downloading 'https://www.notion.so/katex/fonts/KaTeX_SansSerif-Bold.362d94c6.ttf'
[21:47:19] INFO Downloading 'https://www.notion.so/katex/fonts/KaTeX_SansSerif-Italic.2c742978.ttf'
[21:47:19] INFO Downloading 'https://www.notion.so/katex/fonts/KaTeX_SansSerif-Regular.6087fc04.ttf'
[21:47:19] INFO Downloading 'https://www.notion.so/katex/fonts/KaTeX_Script-Regular.781730b2.ttf'
[21:47:19] INFO Downloading 'https://www.notion.so/katex/fonts/KaTeX_Size1-Regular.54a80b37.ttf'
[21:47:19] INFO Downloading 'https://www.notion.so/katex/fonts/KaTeX_Size2-Regular.24cbe093.ttf'
[21:47:19] INFO Downloading 'https://www.notion.so/katex/fonts/KaTeX_Size3-Regular.ee3e5bf4.ttf'
[21:47:19] INFO Downloading 'https://www.notion.so/katex/fonts/KaTeX_Size4-Regular.b78c75bb.ttf'
[21:47:19] INFO Downloading 'https://www.notion.so/katex/fonts/KaTeX_Typewriter-Regular.90f78c10.ttf'
[21:47:19] INFO Exporting page 'https://tomreitz.notion.site/Wisconsin-Websites-ecdb3dc4cd1e40f280b7512a23ca2006' as 'index.html'
[21:47:19] INFO Parsing page 'https://www.notion.so/7514e88c4042418997665b5ecf11733b?v=703812ea01fe4ee6bc010fd72be278f8'
[21:48:20] CRITICAL Timeout waiting for page content to load, or no content found. Are you sure the page is set to public?
[21:48:20] INFO Parsing page 'https://www.notion.so/80f1c747841641e2a729fb0286390da2'
[21:49:21] CRITICAL Timeout waiting for page content to load, or no content found. Are you sure the page is set to public?
[21:49:21] INFO Parsing page 'https://www.notion.so/e861fdd6a0c247ca8bad342d2cdb05b6'
[21:50:22] CRITICAL Timeout waiting for page content to load, or no content found. Are you sure the page is set to public?
[21:50:22] INFO Parsing page 'https://www.notion.so/d5c95ef2e77349e98691b8925de7d119'
[21:51:23] CRITICAL Timeout waiting for page content to load, or no content found. Are you sure the page is set to public?
[21:51:23] INFO Finished!
Processed 1 pages in 00:04:19
If you go to a subpage directly, you'll see that it is public, but is a redirect page from Notion.
Thanks for pointing that out - my PR doesn't handle subpages. When parsing the subpages (sub_page_href), the www.notion.so part should be replaced with {site_name}.notion.site.
I tried a quick fix but there are a few edge cases in the code that I am probably missing, so not submitting a PR yet.
Hi,
I,m currently running into the same issues. Is there any fix available?
Thanks for pointing that out - my PR doesn't handle subpages. When parsing the subpages (sub_page_href), the www.notion.so part should be replaced with {site_name}.notion.site.
I tried a quick fix but there are a few edge cases in the code that I am probably missing, so not submitting a PR yet.
@bryanhpchiang can you post the partial fix here? Others can try it out and help in fixing the edge cases.
I'm having this issue as well. Would appreciate the partial fix if possible @bryanhpchiang
I am not a developer, but there is a quick way to make it work properly. Actually, by simply editing the links at lines 582-584 of loconotion/notionparser.py
, it works.
before editing:
if sub_page_href.startswith("/"):
sub_page_href = "https://www.notion.so" + a["href"]
if sub_page_href.startswith("https://www.notion.so/"):
if parse_links or not len(a.find_parents("div", class_="notion-scroller")):
after editing:
if sub_page_href.startswith("/"):
sub_page_href = "https://xxxx.notion.site" + a["href"]
if sub_page_href.startswith("https://xxxx.notion.site/"):
if parse_links or not len(a.find_parents("div", class_="notion-scroller")):
when running the program, I used python loconotion https://xxxx.notion.site/xxxx/{page-id}
.
Hope this would help.
Created a PR to Use custom new Notion url format https://xxxx.notion.site instead of default one Saw an issue where subfolder is expected in case of link of format https://xxxx.notion.site/xxxx (faced during parsing my website). Fixed that as well.
@bryanhpchiang could you please pull the PR and verify if its working for you as well?
@meSunnySrivastava
Thanks for putting together this PR. Confirming that it did work for my website to parse subpages.
The only issue is that bullet points are now missing.
EDIT: I see that this was supposed to be fixed by https://github.com/leoncvlt/loconotion/pull/73, and that your PR merged those changes as well.
EDIT:
Deleting my dist/
+ regenerating fixed the issue. The PR looks good to me, thanks!
Sorry I had to close the old PR because I pushed to my master directly. :)
PR has been merged, thanks all!
I'm still getting the timeout issue. Exact same as the original post above.
The page is set to public:
The link is https://jamesdeluk.notion.site/James-IT-Notes-9969909992c04b5ba3a734cdf0a74530
(The Copy Link button gives https://www.notion.so/jamesdeluk/James-IT-Notes-9969909992c04b5ba3a734cdf0a74530, which forwards to the above).
Thought I'd try this again with the new Notion update. A couple things:
Trying to access the .site page itself fails:
And webdrive.log loops this:
[1632288655.436][INFO]: Waiting for pending navigations...
[1632288655.437][INFO]: Done waiting for pending navigations. Status: ok
[1632288655.445][INFO]: Waiting for pending navigations...
[1632288655.447][INFO]: Done waiting for pending navigations. Status: ok
[1632288655.447][INFO]: [edc259a3fc220da0c2d6ba0789803d04] RESPONSE FindElements [ ]
[1632288655.957][INFO]: [edc259a3fc220da0c2d6ba0789803d04] COMMAND FindElements {
"using": "css selector",
"value": ".notion-presence-container"
}
Well, that's not gonna work regardless because you're not logged in, so the script is unable to find the notion-presence-container
div which is present on every notion page - it's gonna work with public pages only.
That's my confusion though. I am logged in, and the page is public.
Just checking @leshchenko1979, is this fixed by #92?
I am using current master
version of loconotion with the new style URLs and it seems to work fine: https://github.com/2m/nemunasring/blob/main/nemunasring.toml#L2
Since Notion updated all URLs for hosted pages (see: https://github.com/leoncvlt/loconotion/issues/134) this ticket is no longer an enhancement, but a permanent bug.
We resolved it in our fork here: https://github.com/sueszli/notionSnapshot/
Not 100% sure, but I believe the URL format for Notion shared pages recently changed.
It's now
notion.site
instead ofnotion.so
:Editing view: https://www.notion.so/bryanchiang/Bryan-Chiang-fc01c67a1ed9402e83eb8efd5c99a216 Shared view: https://bryanchiang.notion.site/Bryan-Chiang-fc01c67a1ed9402e83eb8efd5c99a216
I get a parser error with the second one.
Will trying modifying the check for a valid notion.so website.