jkphl / micrometa

A meta parser for extracting micro information out of web documents, currently supporting Microformats 1+2, HTML Microdata, RDFa Lite 1.1, JSON-LD and Link Types, written in PHP
http://micrometa.jkphl.is
MIT License
115 stars 39 forks source link

Recursive loop when ID is the same as URL #29

Closed dochne closed 4 years ago

dochne commented 6 years ago

Example code:

<?php
include("vendor/autoload.php");
$jsonld = <<<EOF
<script type="application/ld+json">
{
  "@context": "http://schema.org/",
  "@type": "Website",
  "@id": "https://www.example.com/",
  "url": "https://www.example.com/"
}
</script>
EOF;

$parser = new \Jkphl\Micrometa\Ports\Parser();
$parser("https://www.example.com/", $jsonld);

If url is changed to omit the final backslash, it works fine. I'm assuming this is related to issue https://github.com/jkphl/micrometa/issues/27. I'm attempting to put together a PR to fix it but I'm ending up a little lost, any advice you can give as to what the most likely cause would be appreciated.

dochne commented 6 years ago

This looks like it's a bug in the underlying JSON-LD library.

jkphl commented 6 years ago

@Dolondro Sorry I didn't have the time to test the issue and reply yet. But yes, it looks like this is very likely — micrometa basically wraps around the JSON-LD parser, adds some functionality and unifies the output across the different formats. As soon as I find some time I'll try to further look into this ...

dochne commented 6 years ago

Hiya, thanks for getting back to me.

It looks like the issue is in Processor.php -> generateNodeMap. It looks like they're keying the node map by the values, hence the collision and the infinite loop.

I'll write a test case and submit it as a Issue with them, but it looks suspiciously like they may have ceased bothering with the project :(

jkphl commented 6 years ago

@Dolondro You might be right, unfortunately. AFAIK it's mostly one guy who wrote it, and I read a statement somewhere a while ago that he doesn't plan to put a lot more effort into it. And AFAIR there was one other library doing basically the same thing but had a comparably ugly API / was harder to work with ... Please let me know if you're lucky with your issue or find a viable alternative. Thanks!

dochne commented 6 years ago

Bug report for the underlying issue can be found here: https://github.com/lanthaler/JsonLD/issues/87

rvanlaak commented 5 years ago

@jkphl https://github.com/digitalbazaar/php-json-ld is the other library you were referring to?

jkphl commented 5 years ago

@rvanlaak Exactly! Unfortunately, there's no news on the upstream error (https://github.com/lanthaler/JsonLD/issues/87) and the API of php-json-ld is still as ugly as 2 years ago. :(

When I decided to build on an external JSON-LD parser I briefly looked into rolling my own, but JSON-LD parsing is far from being trivial, so I decided to focus on what's more important for me (JSON-LD was never a high priority thing for me ...)

Sarke commented 5 years ago

Isn't this just a matter of implementation? If an entity wants to reference itself, or have a child reference the parent, it can do so. Any var dumper will have to solve this problem.

That is, what lanthaler/JsonLD returns is correct, you just have to be careful when you start looping through it not to end up in an endless loop.