obsidianmd / obsidian-importer

Obsidian Importer lets you import notes from other apps and file formats into your Obsidian vault.
https://help.obsidian.md/import
MIT License
702 stars 65 forks source link

[Notion] Importer is parsing code snippets into filenames on zip import #202

Open mbbroberg opened 8 months ago

mbbroberg commented 8 months ago

I've following the Notion import process, using html as the output, and I'm now combing through my results. I have hundreds of files that look something like this:

---
"1": 298
---

As the only file contents. It seems to catch on Python, git, ruby, Go, and other command-line scripting notes like my "Make ZSH default shell.md" file. I found them all pretty quickly once I saw the pattern and used find:

$ find -type f -exec grep -qE '\"[^\"]+\": [0-9]+' {} \; -print 
./class TestClass 1.md
./result.push(Element.toMarkdown(elementsi, parent, prev, next)); 1.md
./th, 1.md
./Install Zsh 1.md
./html(text) 1.md
./toMarkdown() { 3 1.md
./b, , 1.md
./w = csv.writer(f, delimiter=',') 1.md

And so on -- there were about 300+ from my 5k files. Swapping -print for -delete took care of it. Seems like a bug in the parser and I wanted to share.

UPDATE -- I missed quite a few misparsed files. The pattern seems to be that content got pushed into the YAML front matter. The filename isn't an indicator so I was writing a script to find them all.

mbbroberg commented 8 months ago

I was going through the index.html Notion spits out and noticed that they have a syntax error if a note has "++ " in the title.

Screenshot 2024-01-22 at 6 40 02 PM

This leaves a broken list item that leads to a bunch of junky text in that file. I went a little deeper and found another file that parsed that way (it was a regex cheatsheet 🤦). This looks like it's a bug on Notion's export first and foremost. That said, the importer has an opportunity to sanitize its inputs.

Hope that's helpful.