sabre-io / vobject

:date: The VObject library for PHP allows you to easily parse and manipulate iCalendar and vCard objects
http://sabre.io/vobject/
BSD 3-Clause "New" or "Revised" License
570 stars 125 forks source link

parser error when DATE is empty #646

Closed JohnRDOrazio closed 3 months ago

JohnRDOrazio commented 5 months ago

I was attempting to parse a defective ICS file, which happened to have empty CREATED: and LAST-MODIFIED: fields:

BEGIN:VEVENT
CLASS:PUBLIC
DTSTART;VALUE=DATE:20291231
DTSTAMP:20240422T062620Z
UID:c0e7126a841a70761e4057e58b5e2d52
CREATED:
DESCRIPTION:\nA fun event\nRSVP
LAST-MODIFIED:

I wasn't getting parsing errors from the validate() call, instead I was getting PHP warnings about an uninitialized string offset 0 in the DateTimeParser::parse() method, specifically from the line:

if ('P' === $date[0] || ('-' === $date[0] && 'P' === $date[1])) {

The PHP warning made me realize that $date was empty, and in fact upon inspecting the ICS file I found the empty CREATED: and LAST-MODIFIED: fields.

Perhaps the parser should catch those situations in which the expected field has an empty value, and throw a catchable error?

https://github.com/sabre-io/vobject/blob/caa7af8e3304c85c63cdb6464ecafe09bb9918f6/lib/DateTimeParser.php#L202-L211

phil-davis commented 5 months ago

@JohnRDOrazio can you propose some code, and add a unit test, in a PR. Then I can review it.

It will be good to handle defective input data, rather than just have PHP "crash".

JohnRDOrazio commented 5 months ago

I see better what's going on now. The validate() method is actually picking up on the errors and returning them. But when I try to JSON serialize the results to send to a frontend UI, it's the serialization process that actually triggers the error, because the results of the validation contain a reference to the problematic node itself. So the internal serializer is attempting to stringify the node but failing because of the errors in the node itself.

I figure, for checking against validation errors, the most useful thing is a reference to the line number and perhaps a string representation of the line in question. It's not really useful to have a reference to the node object, unless it's possible to get the line number and a string representation of the line from the node?

JohnRDOrazio commented 5 months ago

I'm seeing that MimeDir does actually keep track of the line index when reading and parsing a file:

https://github.com/sabre-io/vobject/blob/caa7af8e3304c85c63cdb6464ecafe09bb9918f6/lib/Parser/MimeDir.php#L289

https://github.com/sabre-io/vobject/blob/caa7af8e3304c85c63cdb6464ecafe09bb9918f6/lib/Parser/MimeDir.php#L296-L298

Perhaps the Property class could have an extra property public int $lineIndex; which could be set when creating a new Property? That way each node could keep track of it's own line number within the original file?

That way, even if the validate() results do return a Node, you could still access the line number from the node: $node->lineIndex. And I can use that information to send back to my frontend, instead of serializing the node itself.

JohnRDOrazio commented 5 months ago

started a fix for this in PR #649

JohnRDOrazio commented 5 months ago

with this fix in place I'm able to gather the line number and string from the node passed in the errors, after calling validate():

<?php
class ICSErrorLevel {
    const int REPAIRED = 1;
    const int WARNING = 2;
    const int FATAL = 3;
    const ERROR_STRING = [
        null,
        'Repaired value',
        'Warning',
        'Fatal Error'
    ];
    private string $errorString;

    public function __construct( int $errorLevel ) {
        $this->errorString = self::ERROR_STRING[ $errorLevel ];
    }

    public function __toString() {
        return $this->errorString;
    }
}
$response = new stdClass;
$response->messages = [];
$response->errors = [];

        //after reading from a calendar file...
        $errors = $vcalendar->validate();
        if( count( $errors ) > 0 ) {

            foreach( $errors as $error ) {
                $errorLevel = new ICSErrorLevel( $error['level'] );
                //$nodeType = get_class($error["node"]);
                //$nodeValue = $error["node"]->getValue();
                $response->errors[] = $errorLevel . ": " . $error['message'] . " at line {$error["node"]->lineIndex} ({$error["node"]->lineString})";// . "($nodeType)";
            }
        } else {
            $response->messages[] = "No validation errors were found while validating data from the calendar resource";
        }

header('Content-Type: application/json; charset=utf-8');
echo json_encode( $response, JSON_PRETTY_PRINT );
exit(0);

I can now see in the output:

        "Fatal Error: The supplied value () is not a correct DATE-TIME at line 9277 (LAST-MODIFIED:)",
        "Fatal Error: The supplied value () is not a correct DATE-TIME at line 9292 (CREATED:)",
        "Fatal Error: The supplied value () is not a correct DATE-TIME at line 9294 (LAST-MODIFIED:)",
        "Fatal Error: The supplied value () is not a correct DATE-TIME at line 9309 (CREATED:)",

which is quite helpful for identifying the issues quickly and to the point. So instead of doing a json_encode($errors) which results in trying to stringify a defective node (and generating a PHP warning), I can better examine the node in question and see exactly where it is located in the original file.

JohnRDOrazio commented 5 months ago

I added a unit test to the PR which verifies that lineIndex and lineString parameters are accessible on the node reference in the errors returned from a validate() on defective calendar data.

phil-davis commented 3 months ago

Released in https://github.com/sabre-io/vobject/releases/tag/4.5.5