qownnotes / web-companion

Browser extension to browse bookmarks and create notes in QOwnNotes
https://www.qownnotes.org/
GNU General Public License v3.0
80 stars 12 forks source link

web clipper not handling tables, other fine points well #35

Open pcause opened 2 years ago

pcause commented 2 years ago

Was in browser at this page https://github.com/qownnotes/web-companion/issues/new and wanted to clip. I right clicked the page and selected the "Send page to QOwnNotes". The page was sent but the content was not translated to markdown well. In particular a table of laptops was not captured/rendered well. Joplin's web clipper gives a pretty faithful translation. Their web clipper is open source so you might look at their logic to see if there are ideas to improve your capture/trranslate logic to get better results.

I've attached a zip file with the web page as exported by Firefox, the markdown exported ny QOwnNotes and the markdown as exported by Joplin.

sorry fotgot the files

clipper.zip

Expected behaviour

More faithful markdown translation especially the tables.

Actual behaviour

Steps to reproduce

go the the URL provides and right click the page and select the QoenNotes/Send Page option.

Output from the debug section in the settings dialog in QOwnNotes

using 22.1.3

pbek commented 2 years ago

Where did you get tables on https://github.com/qownnotes/web-companion/issues/new? I'm open to pull requests, the html to markdown transformation is done by https://github.com/pbek/QOwnNotes/blob/1e61524efa60697f5cb4c79c726b6bdfee749934/src/mainwindow.cpp#L7524-L7580.

You can also do transformations directly in a script with a hook: https://www.qownnotes.org/scripting/hooks.html#websocketrawdatahook

pcause commented 2 years ago

sorry pasted the wrong link. here is the right one: https://www.ultrabookreview.com/42630-intel-evo-laptops/

this is the part of the page t gets converted to a table in Joplin:

image

this is joplin preview

image

this is the joplin markdown

image

in Qown preview we see the content where the table starts through almost the rest of the article is missing. Not just the table, but the rest of the article

image

and here is the markup

image

Abd gere is a view of the HTML source at the start of the table on the web page

image