omnivore-app / obsidian-omnivore

Obsidian plugin to fetch articles and highlights from Omnivore
MIT License
757 stars 41 forks source link

Unable to save file containing emoji's on Windows #77

Open mrmichaeladavis opened 1 year ago

mrmichaeladavis commented 1 year ago

image

When a title contains an emoji, it is not able to be saved when running the plugin on Windows. Filenames should be filtered to remove emoji's and other non-supported characters.

sywhb commented 1 year ago

Hey @mrmichaeladavis , thanks for reporting.

After some research, we found out that it's the control characters in the filename caused the error and it has be fixed in the latest version 1.4.1.

Could you please upgrade your plugin and let me know if it works for you?

Thank you!

mrmichaeladavis commented 1 year ago

It is still not sync even after upgrading to newer plugin version. Same error. I also removed the omnivore plugin, deleted the folder, and reinstalled it so it would sync from scratch.

image

So that we have a MWE this is the page in omnivore that is saved and syncing:

https://twitter.com/CroissantEth/status/1508455422818230279 you can see that Obsidian is only showing the Cook/Chef emoji, but the actual title has a Zero Width Joiner control code between the chef and cooking emoji. Saving the HTML page directly from MS Edge shows they replace the ZWJ with an underscore.

I think the right approach is to replace or remove non-printable characters. a Javascript library which does this is https://www.npmjs.com/package/out-of-character I tried it, and it does find the ZWJ.

The regex you merged, does some of the Unicode control codes, but not all. You could expand that to include the non-printable characters, a list is at https://invisible-characters.com/

I am surprised this is not something the obsidian API provides as part of FileManager

image

sywhb commented 1 year ago

Hey @mrmichaeladavis , sorry for the late reply.

This is really a good catch! I will try and test the library provided.

Thanks a lot

sywhb commented 1 year ago

Hey @mrmichaeladavis , 1.5.0 is released and now it will remove the invisible control characters in the filename

sywhb commented 1 year ago

1.5.1 is released to fix a bug with saving unicode characters in the filename

mrmichaeladavis commented 1 year ago

I upgraded to 1.5.3, and redownloaded articles. Sadly, same error.

mrmichaeladavis commented 1 year ago

From looking at the source of out-of-character, when the Zero Width Joiner is specified, it keeps the emoji, not remove the invisible character? which I think is the problem still?

// is this zero-width used in an emoji? if (isEmoji(text, offset)) { return ch //do nothing }

sywhb commented 1 year ago

Sorry, I couldn't find a good library to remove all the invisible characters for now and I will take a look at the source code