uogbuji / versa

Versa model for Web resources & relationships. Similar to Resource Description Framework (RDF) but with a goal to be at once simpler and more expressive. Includes processing tools.
Apache License 2.0
10 stars 5 forks source link

Markdown rel parsing regex and messy input #17

Closed distobj closed 3 years ago

distobj commented 3 years ago

I'm running into issues with Versa choking on irregular and malformed input during markdown parsing. The properties contain embedded HTML. It appears as though REL_PAT in markdown_parse.py takes the less-than symbol to indicate that the property value is an IRIref and so tries to do linky things with the value and a base, usually failing and resulting in a discarded record.

I've attached a sample unpaywall record.

unpay-badtitle.txt

uogbuji commented 3 years ago

Hi @distobj what you attached is JSON rather than Versa Literate/Markdown. Can you attach the latter so I can see the problem clearly? Thanks.

uogbuji commented 3 years ago

A discussion with @distobj clarified this, and I've pushed a fix so the user doesn't have to escape this case.