houseabsolute / DateTime-Format-Builder

Create DateTime parser classes and objects.
http://metacpan.org/release/DateTime-Format-Builder/
Other
0 stars 3 forks source link

NAME

DateTime::Format::Builder - Create DateTime parser classes and objects.

VERSION

version 0.83

SYNOPSIS

package DateTime::Format::Brief;

use DateTime::Format::Builder (
    parsers => {
        parse_datetime => [
            {
                regex  => qr/^(\d{4})(\d\d)(\d\d)(\d\d)(\d\d)(\d\d)$/,
                params => [qw( year month day hour minute second )],
            },
            {
                regex  => qr/^(\d{4})(\d\d)(\d\d)$/,
                params => [qw( year month day )],
            },
        ],
    }
);

DESCRIPTION

DateTime::Format::Builder creates DateTime parsers. Many string formats of dates and times are simple and just require a basic regular expression to extract the relevant information. Builder provides a simple way to do this without writing reams of structural code.

Builder provides a number of methods, most of which you'll never need, or at least rarely need. They're provided more for exposing of the module's innards to any subclasses, or for when you need to do something slightly beyond what I expected.

TUTORIAL

See DateTime::Format::Builder::Tutorial.

ERROR HANDLING AND BAD PARSES

Often, I will speak of undef being returned, however that's not strictly true.

When a simple single specification is given for a method, the method isn't given a single parser directly. It's given a wrapper that will call on_fail if the single parser returns undef. The single parser must return undef so that a multiple parser can work nicely and actual errors can be thrown from any of the callbacks.

Similarly, any multiple parsers will only call on_fail right at the end when it's tried all it could.

on_fail (see later) is defined, by default, to throw an error.

Multiple parser specifications can also specify on_fail with a coderef as an argument in the options block. This will take precedence over the inheritable and overrideable method.

That said, don't throw real errors from callbacks in multiple parser specifications unless you really want parsing to stop right there and not try any other parsers.

In summary: calling a method will result in either a DateTime object being returned or an error being thrown (unless you've overridden on_fail or create_method, or you've specified a on_fail key to a multiple parser specification).

Individual parsers (be they multiple parsers or single parsers) will return either the DateTime object or undef.

SINGLE SPECIFICATIONS

A single specification is a hash ref of instructions on how to create a parser.

The precise set of keys and values varies according to parser type. There are some common ones though:

See the documentation for the individual parsers for their valid keys.

Parsers at the time of writing are:

Subroutines / coderefs as specifications.

A single parser specification can be a coderef. This was added mostly because it could be and because I knew someone, somewhere, would want to use it.

If the specification is a reference to a piece of code, be it a subroutine, anonymous, or whatever, then it's passed more or less straight through. The code should return undef in event of failure (or any false value, but undef is strongly preferred), or a true value in the event of success (ideally a DateTime object or some object that has the same interface).

This all said, I generally wouldn't recommend using this feature unless you have to.

Callbacks

I mention a number of callbacks in this document.

Any time you see a callback being mentioned, you can, if you like, substitute an arrayref of coderefs rather than having the straight coderef.

MULTIPLE SPECIFICATIONS

These are very easily described as an array of single specifications.

Note that if the first element of the array is an arrayref, then you're specifying options.

EXECUTION FLOW

Builder allows you to plug in a fair few callbacks, which can make following how a parse failed (or succeeded unexpectedly) somewhat tricky.

For Single Specifications

A single specification will do the following:

User calls parser:

my $dt = $class->parse_datetime($string);
  1. preprocess is called. It's given $string and a reference to the parsing workspace hash, which we'll call $p. At this point, $p is empty. The return value is used as $date for the rest of this single parser. Anything put in $p is also used for the rest of this single parser.

  2. regex is applied.

  3. If regex did not match, then on_fail is called (and is given $date and also label if it was defined). Any return value is ignored and the next thing is for the single parser to return undef.

    If regex did match, then on_match is called with the same arguments as would be given to on_fail. The return value is similarly ignored, but we then move to step 4 rather than exiting the parser.

  4. postprocess is called with $date and a filled out $p. The return value is taken as a indication of whether the parse was a success or not. If it wasn't a success then the single parser will exit at this point, returning undef.

  5. DateTime->new is called and the user is given the resultant DateTime object.

See the section on error handling regarding the undefs mentioned above.

For Multiple Specifications

With multiple specifications:

User calls parser:

my $dt = $class->complex_parse($string);
  1. The overall _preprocess_or is called and is given $string and the hashref $p (identically to the per parser preprocess mentioned in the previous flow).

    If the callback modifies $p then a copy of $p is given to each of the individual parsers. This is so parsers won't accidentally pollute each other's workspace.

  2. If an appropriate length specific parser is found, then it is called and the single parser flow (see the previous section) is followed, and the parser is given a copy of $p and the return value of the overall _preprocess_or as $date.

    If a DateTime object was returned so we go straight back to the user.

    If no appropriate parser was found, or the parser returned undef, then we progress to step 3!

  3. Any non-length based parsers are tried in the order they were specified.

    For each of those the single specification flow above is performed, and is given a copy of the output from the overall preprocessor.

    If a real DateTime object is returned then we exit back to the user.

    If no parser could parse, then an error is thrown.

See the section on error handling regarding the undefs mentioned above.

METHODS

In the general course of things you won't need any of the methods. Life often throws unexpected things at us so the methods are all available for use.

import

import is a wrapper for create_class. If you specify the class option (see documentation for create_class) it will be ignored.

create_class

This method can be used as the runtime equivalent of import. That is, it takes the exact same parameters as when one does:

use DateTime::Format::Builder ( ... )

That can be (almost) equivalently written as:

use DateTime::Format::Builder;
DateTime::Format::Builder->create_class( ... );

The difference being that the first is done at compile time while the second is done at run time.

In the tutorial I said there were only two parameters at present. I lied. There are actually three of them.

In addition to creating any of the methods it also creates a new method that can instantiate (or clone) objects.

SUBCLASSING

In the rest of the documentation I've often lied in order to get some of the ideas across more easily. The thing is, this module's very flexible. You can get markedly different behaviour from simply subclassing it and overriding some methods.

create_method

Given a parser coderef, returns a coderef that is suitable to be a method.

The default action is to call on_fail in the event of a non-parse, but you can make it do whatever you want.

on_fail

This is called in the event of a non-parse (unless you've overridden create_method to do something else.

The single argument is the input string. The default action is to call croak. Above, where I've said parsers or methods throw errors, this is the method that is doing the error throwing.

You could conceivably override this method to, say, return undef.

USING BUILDER OBJECTS aka USERS USING BUILDER

The methods listed in the METHODS section are all you generally need when creating your own class. Sometimes you may not want a full blown class to parse something just for this one program. Some methods are provided to make that task easier.

new

The basic constructor. It takes no arguments, merely returns a new DateTime::Format::Builder object.

my $parser = DateTime::Format::Builder->new;

If called as a method on an object (rather than as a class method), then it clones the object.

my $clone = $parser->new;

clone

Provided for those who prefer an explicit clone method rather than using new as an object method.

my $clone_of_clone = $clone->clone;

parser

Given either a single or multiple parser specification, sets the object to have a parser based on that specification.

$parser->parser(
    regex  => qr/^ (\d{4}) (\d\d) (\d\d) $/x;
    params => [qw( year    month  day    )],
);

The arguments given to parser are handed directly to create_parser. The resultant parser is passed to set_parser.

If called as an object method, it returns the object.

If called as a class method, it creates a new object, sets its parser and returns that object.

set_parser

Sets the parser of the object to the given parser.

$parser->set_parser($coderef);

Note: this method does not take specifications. It also does not take anything except coderefs. Luckily, coderefs are what most of the other methods produce.

The method return value is the object itself.

get_parser

Returns the parser the object is using.

my $code = $parser->get_parser;

parse_datetime

Given a string, it calls the parser and returns the DateTime object that results.

my $dt = $parser->parse_datetime('1979 07 16');

The return value, if not a DateTime object, is whatever the parser wants to return. Generally this means that if the parse failed an error will be thrown.

format_datetime

If you call this function, it will throw an error.

LONGER EXAMPLES

Some longer examples are provided in the distribution. These implement some of the common parsing DateTime modules using Builder. Each of them are, or were, drop in replacements for the modules at the time of writing them.

THANKS

Dave Rolsky (DROLSKY) for kickstarting the DateTime project, writing DateTime::Format::ICal and DateTime::Format::MySQL, and some much needed review.

Joshua Hoblitt (JHOBLITT) for the concept, some of the API, impetus for writing the multi-length code (both one length with multiple parsers and single parser with multiple lengths), blame for the Regex custom constructor code, spotting a bug in Dispatch, and more much needed review.

Kellan Elliott-McCrea (KELLAN) for even more review, suggestions, DateTime::Format::W3CDTF and the encouragement to rewrite these docs almost 100%!

Claus Färber (CFAERBER) for having me get around to fixing the auto-constructor writing, providing the 'args'/'self' patch, and suggesting the multi-callbacks.

Rick Measham (RICKM) for DateTime::Format::Strptime which Builder now supports.

Matthew McGillis for pointing out that on_fail overriding should be simpler.

Simon Cozens (SIMON) for saying it was cool.

SEE ALSO

datetime@perl.org mailing list.

http://datetime.perl.org/

perl, DateTime, DateTime::Format::Builder::Tutorial, DateTime::Format::Builder::Parser

SUPPORT

Bugs may be submitted at https://github.com/houseabsolute/DateTime-Format-Builder/issues.

I am also usually active on IRC as 'autarch' on irc://irc.perl.org.

SOURCE

The source code repository for DateTime-Format-Builder can be found at https://github.com/houseabsolute/DateTime-Format-Builder.

DONATIONS

If you'd like to thank me for the work I've done on this module, please consider making a "donation" to me via PayPal. I spend a lot of free time creating free software, and would appreciate any support you'd care to offer.

Please note that I am not suggesting that you must do this in order for me to continue working on this particular software. I will continue to do so, inasmuch as I have in the past, for as long as it interests me.

Similarly, a donation made in this way will probably not make me work on this software much more, unless I get so many donations that I can consider working on free software full time (let's all have a chuckle at that together).

To donate, log into PayPal and send money to autarch@urth.org, or use the button at https://www.urth.org/fs-donation.html.

AUTHORS

CONTRIBUTORS

COPYRIGHT AND LICENSE

This software is Copyright (c) 2020 by Dave Rolsky.

This is free software, licensed under:

The Artistic License 2.0 (GPL Compatible)

The full text of the license can be found in the LICENSE file included with this distribution.