inukshuk / edtf.js

Extended Date Time Format (ISO 8601-2 / EDTF) Parser for JavaScript
BSD 2-Clause "Simplified" License
66 stars 12 forks source link

'u' for unspecified rather than 'X'? #5

Closed retorquere closed 8 years ago

retorquere commented 8 years ago

Section 5.2.2 from the EDTF spec says 'u' specifies an unknown number. Can support for this be added? Currently edtf(date.replace(/u/g, 'X')) works as a workaround.

inukshuk commented 8 years ago

edtf.js implements the upcoming EDTF / ISO profile, currently a working draft, part 1 and part 2 (EDTF is mostly covered in part 2).

I'd be open to creating a separate version of edtf.js which implements the old/current spec, but some of the necessary changes are not trivial to implement.

retorquere commented 8 years ago

A separate version would be most welcome. My quicky workaround is not safe to use then, if you say the necessary changes are non-trivial?

inukshuk commented 8 years ago

Well, the workaround covers only a single aspect: it is safe to use if your data contains no other features of the old syntax. For example, parentheses have been removed (instead of 2016-(10)? you would now write 2016-?10); there is a new symbol % for both uncertain and unspecified and probably more changes.

Come to think of it, instead of supporting the old spec, we could only implement the parsing which would basically give us a converter from the old to the new syntax.

retorquere commented 8 years ago

I see. My main interest is validation, so a parser for the old syntax would suffice. I'm testing whether a date(/range) is proper EDTF for my BibLaTeX transformer, the BibLaTeX processor will do its own EDTF handling.

inukshuk commented 8 years ago

Awesome, I was not aware that BibLaTeX supports EDTF now! Currently they seem to support only levels 0 and 1 -- they are probably rather easy to work implement or workaround (i.e., replace X with u and ?~ / ~? with %).

I also wrote edtf-ruby which implements the old format (and there are a few other implementations available also) so you could use that right away.

retorquere commented 8 years ago

While ruby is my programming language of choice, the parsing I do now is for the Better BibTeX extension to Zotero, so I need a javascript solution. But are you saying that EDTF.js will parse L0 & L1 correctly if I just substitute u and ?~?

retorquere commented 8 years ago

EDTF.js also doesn't recognize the following as EFTF date ranges:

unknown/2006
2004-06-01/unknown
2004-01-01/open

suggestions for replacing those?

inukshuk commented 8 years ago

That was no thorough analysis, but L0 (which is probably identical anyway) and L1 will not be hard to support. You need to compare the old and new spec level 1 to make sure to catch all differences.

You can replace 'unknown' with * (*/2006) and 'open' with an empty string (2005/) and the parsing should work (IIRC, these may be swapped in the final version, i.e., '*' meaning open and '' unknown).

retorquere commented 8 years ago

That's a little hard to read -- is that '' for unknown and empty string for open? I see some italics there and that usually means there's an underscore somewhere that's been interpreted as markdown.

retorquere commented 8 years ago

(sorry for totally hijacking this issue -- please do say if you want to continue elsewhere)

I'm also trying to transpile edtf.js to es5, because there are apparently versions of FF around that don't support the requisite part of ES6. If you've successfully browserified (no require in FF) / es5-ified edtf I'd love to know about it.

inukshuk commented 8 years ago

Take a look at the WD of part 2 I linked to above -- section 4.5 covers the enhanced intervals. Unknown is a star * and if the start or end is left blank it indicates an open interval. Like I said, the semantics of this may actually be swapped for the final version, and edtf.js already implements it this way (i.e., the other way around) -- see note 1 here.

I have not compiled the library to ES5 myself, but I don't think there should be any problems. The API depends a lot on generators (which would require something like regenerator) but you only need the parser, so I would suggest to only include that part and skip the rest. That is to say, you only need the files: parser.js, grammar.js, bitmask.js and util.js (and nearley, which is ES5 compatible if I remember correctly) -- that should make it much easier to compile.

retorquere commented 8 years ago

But then I'd still have to fudge the input, right? Because that parser won't understand what BibLaTeX will.

Currently I'm putting this through edtf.js: date.replace(/unknown/g, '*').replace(/open/g, '').replace(/u/g, 'X').replace(/\?~/g, '%'); it passes all my tests, but it looks icky to do that.

retorquere commented 8 years ago

FF is OK with generators BTW, it currently complains about expected expression, got '.', which I'm guessing (because I don't get line numbers for some reason) relates to this.

inukshuk commented 8 years ago

Yes, unless you change the actual grammar you have to still replace the string before parsing. Actually, what you are doing there is converting old EDTF to new EDTF basically.

new.target is only supported since FF 41 so may be causing problems. This part should not be necessary though, if all you want to do is validate the input.

retorquere commented 8 years ago

Thank you! Turns out edtf.js runs just fine in FF 45 ESR as-is, circle just has a very old version. Locally 45 ESR runs just fine, trying to get that on Circle now.

I'm still a little worried that my string of replaces might turn something that ought not to be recognized palatable. It's for sure the a lot better than I had though; EDTF is not as easily captured by a regex as I had initially thought.

retorquere commented 8 years ago

Runs, no issue at all. Super thanks.