seboettg / citeproc-php

Full-featured CSL 1.0.1 processor for PHP
MIT License
73 stars 38 forks source link

Attempt to fix issue #141. #192

Open hktang opened 1 month ago

hktang commented 1 month ago

Hi it seems the issue with additional slashes come from the following code:

$pattern = "/(\s|\/)/";

if (!preg_match($pattern, $titleString, $matches)) {
    return StringHelper::mb_ucfirst($titleString);
}

// Here, delimiter could either be whitespace or a slash.
$delimiter = $matches[1];

// Here, the title is split by both whitespate AND a slash.
// If the $delimiter happens to be a slash, then the string will be
// pieced back together by slashes.
$wordArray = preg_split($pattern, $titleString); //explode(" ", $titleString);

Without changing the original code, I propose we walk through the converted string, and keep the occurrence of a delimiter if they exist in both strings, but replace it with the original character (a whitespace) if it doesn't exist in the original string.

This should be safe but I don't insist on the implementation. Please let me know your thoughts!

Also, I added some test cases in the unit test for this function. I hope they are useful.