Closed k00ni closed 2 years ago
The current regex is too restrictive as you point out. The proposed new one is too liberal. Would you mind updating it to match the spec?
I updated the regex and added a few tests to prove they work. Is that approach OK?
Any idea how to handle the following characters? U+00B7, U+0300 to U+036F and U+203F to U+2040
You should be able to match then with \x{00B7}
etc. If I read the regex in the change correctly, it would disallow IDs such as _:b
as it always requires at least two characters. You'll have to make the second oder optional.
_:b
is covered now.
To include U+00B7, U+0300 to U+036F and U+203F to U+2040, I had to use UTF-8 mode in regex. This lead to failing tests because at Processor.php line 2895 it tries to use $property
, which is an empty string sometimes. I have no idea why that is.
1) ML\JsonLD\Test\NQuadsTest::testEscaping
Error: Cannot access empty property
/var/www/html/Processor.php:2895
/var/www/html/Processor.php:2322
/var/www/html/JsonLD.php:465
/var/www/html/Test/NQuadsTest.php:45
phpvfscomposer:///var/www/html/vendor/phpunit/phpunit/phpunit:51
Without using UTF-8 mode in regex I get the following error:
preg_match(): Compilation failed: character value in \x{} or \o{} is too large at offset 76
We have two options as far as I can tell:
Processor.php
and keep UTF-8 characters.WDYT?
How do we proceed here @lanthaler?
Thanks for the ping, I missed your previous comment. I'd suggest we remove the UTF-8 characters for now. We can split that of into a separate bug.
I removed my UTF-8 extensions.
After running PHPUnit (both PHP 5.6 and 7.4), I ran into a couple of errors and failures, such as:
1) ML\JsonLD\Test\W3CTestSuiteTest::testRemoteDocumentLoading with data set "http://json-ld.org/test-suite/tests/remote-doc-manifest.jsonld#t0012" ('Multiple context link headers', stdClass Object (...), stdClass Object (...))
Failed asserting that 'loading document failed' is equal to expected exception code 'multiple context link headers'.
phpvfscomposer:///var/www/html/vendor/phpunit/phpunit/phpunit:51
or
10) ML\JsonLD\Test\W3CTestSuiteTest::testRemoteDocumentLoading with data set "http://json-ld.org/test-suite/tests/remote-doc-manifest.jsonld#t0011" ('load JSON document with exten...h link', stdClass Object (...), stdClass Object (...))
ML\JsonLD\Exception\JsonLdException: Unable to load the remote document "https://json-ld.org:443/test-suite/tests/remote-doc-0011-in.jldt".
/var/www/html/FileGetContentsLoader.php:62
/var/www/html/JsonLD.php:142
/var/www/html/Test/W3CTestSuiteTest.php:182
phpvfscomposer:///var/www/html/vendor/phpunit/phpunit/phpunit:51
My changes are not the source of these errors because they persisted after reverting ALL of my changes. Unfortunately, there is no CI for pull requests to have a neutral view on the matter. I assume there were changes in http://json-ld.org/test-suite/tests/remote-doc-manifest.jsonld in the meantime which broke your tests.
I leave it to you how to proceed.
Oh, I didn't realize that CI is broken. Travis was running on PR before. Your code LGTM. I'll have a look at the tests and the CI setup (likely on the weekend) and merge your code.
Thanks a lot for your work on this Konrad.
Could you please allow edits to this PR? I've found the bug and would like to push it back to the PR before merging it to have the full history here
Btw. I've also set up CI using GitHub Actions and added a devcontainer for easier development
Could you please allow edits to this PR?
It is allowed already. "Allow edits by maintainers" is checked.
Btw. you mean the bug which stopped us from using UTF-8 symbols in the regex?
No, the bug that caused tests to fail. I didn't look into the UTF-8 symbols yet.
Thanks for working on this!
As far as I understand https://www.w3.org/TR/n-quads/#BNodes, blank node labels in NQuads can include
-
,.
and_
too. Specification requires these characters not being at the beginning of the label, but this regex is not structured to check that anyway.