openlink / structured-data-sniffer

The Openlink Structured Data Sniffer (OSDS) is a plugin for the Chrome, Firefox and Opera browsers that detects and shows structured data embedded in web pages in either JSON-LD, Microdata, RDFa or Turtle format.
http://osds.openlinksw.com/
GNU General Public License v2.0
121 stars 22 forks source link

Stuck in "Processing data ..." #14

Closed devurandom closed 5 years ago

devurandom commented 5 years ago

When I open http://techascent.com/blog/tvm-for-the-win.html and press "Show Document Metadata" the popup pops up and displays "Processing data ..." with a spinner. Afterwards nothing happens, even if I wait for several minutes.

Structured Data Sniffer 2.16.10, Firefox 63.0.3, Fedora 29

TallTed commented 5 years ago

I see similar with OSDS 2.16.13, Chrome 70.0.3538.102, macOS 10.10.5.

Looking at the source of the problem page, I see very broken HTML, as the W3C Validator report confirms.

It's possible that a future OSDS may be tweaked to handle this kind of HTML breakage better, but for certain that page should be fixed -- which may let current OSDS handle it as expected/desired.

devurandom commented 5 years ago

What I would expect from OSDS is that it goes to " Processing data ..." for a second, then it figures out that the HTML is broken and no data can be extracted, and then it replaces " Processing data ..." with "Cannot extract data from broken page! Please contact the site administrator.". The bug here is IMO that nothing happens: No error message, no empty table, but a forever spinning spinner.

TallTed commented 5 years ago

@devurandom - Agreed.

@smalinin, please look into this. OSDS needn't analyze or report on the issue beyond saying "I can't digest this" but it definitely shouldn't spin forever.

ghost commented 5 years ago

https://www.braveclojure.com/multimethods-records-protocols/ also shows this behaviour. The main page https://www.braveclojure.com/ quickly shows a result, but this sub-page is "processing data" forever. That site does not seem to contain any grave structural problems as the other page did.

TallTed commented 5 years ago

@urzds - The braveclojure sub-page does have some significant problems, as the Validator reports, but again, these errors shouldn't cause OSDS to spin forever.

Comments, @smalinin?

devurandom commented 5 years ago

That site does not seem to contain any grave structural problems as the other page did.

@urzds - The braveclojure sub-page does have some significant problems, as the Validator reports, but again, these errors shouldn't cause OSDS to spin forever.

I am not an HTML expert, but to me it looks like the Brave Clojure page contains only minor issues that a HTML parser should be able to forgive, e.g. missing attributes or a duplicate ID. So in addition to not spinning forever, I would think that OSDS should even be able to extract information despite the errors, since most of the information it needs does not appear to be damaged.

smalinin commented 5 years ago

The issues above will be fixed in next release >= 2.16.15

devurandom commented 5 years ago

The issues above will be fixed in next release >= 2.16.15

Thanks!

ghost commented 5 years ago

The issues above will be fixed in next release >= 2.16.15

I cannot find the code for this. Could you please give me a hint?

TallTed commented 5 years ago

@urzds - Please look to this fork for active development (including the relevant patches for this issue). We'll be doing something to re-align these forks in the near future.