Closed the-turk closed 4 years ago
If you want to treat old
and new
text as HTML, I think you will need to implement a new HtmlDiffer
to diff DOM nodes. And then implement a new renderer to properly display the diff.
Related: https://github.com/jfcherng/php-diff/issues/9#issuecomment-526808774
If you just want to treat old
and new
text as plain text which Differ
is currently doing, then you just need a new renderer I guess. Or maybe the output of the Json
renderer can be used to render in the website frontend via Javascript.
We might summarize the JSON output.
Check the following example out:
Original:
Donec rutrum, odio id tempus consequat, nunc nisi pulvinar dolor, ac faucibus massa erat non est. Integer blandit, eros quis sodales vulputate, erat ex euismod nisl, gravida vestibulum nibh elit pretium quam. Mauris turpis tortor, lacinia vitae purus eget, vulputate mollis felis.
New:
Donec rutrum, odio id tempus consequat, nunc nisi pulvinar dolor, ac faucibus massa erat non est. Integer blandit, eros quis sodales vulputate, erat ex euismod nisl, gravida vestibulum nibh elit pretium quam. Mauris turpis tortor, lacinia vitae purus eget, vulputate testing mollis felis.
JSON output from our repository (duplicates the text to show both versions):
[
[
{
"tag": 8,
"old": {
"offset": 0,
"lines": [
"Donec rutrum, odio id tempus consequat, nunc nisi pulvinar dolor, ac faucibus massa erat non est. Integer blandit, eros quis sodales vulputate, erat ex euismod nisl, gravida vestibulum nibh elit pretium quam. Mauris turpis tortor, lacinia vitae purus eget, vulputate mollis felis."
]
},
"new": {
"offset": 0,
"lines": [
"Donec rutrum, odio id tempus consequat, nunc nisi pulvinar dolor, ac faucibus massa erat non est. Integer blandit, eros quis sodales vulputate, erat ex euismod nisl, gravida vestibulum nibh elit pretium quam. Mauris turpis tortor, lacinia vitae purus eget, vulputate <ins>testing </ins>mollis felis."
]
}
}
]
]
HTML output from our repository: Click to view the image.
Output on GitHub: Click to view the image.
Thoughts:
Note that the information from AbstractHtml::getChanges()
(from upstream SequenceMatcher
) does not directly tell things like line X is modified from line Y. It tells that "block (lines)" X is modified from "block" Y.
Old:
Donec rutrum, test.
New:
Donec rutrum.
There is a new inserted line.
JSON Result:
[
[
{
"tag": "rep",
"old": {
"offset": 0,
"lines": [
"Donec rutrum, test."
]
},
"new": {
"offset": 0,
"lines": [
"Donec rutrum.",
"There is a new inserted line."
]
}
}
]
]
Somehow I feel those codes in Jfcherng\Diff\Renderer\Html\LineRenderer\Word
may be inspiring. Treating lines
as "a string with some \n
s in it".
@jfcherng i wrote a simple (and little tricky) renderer without playing with JSON format for GitHub similarization, check that out.
\Renderer\Html\Combined.php:
<?php
declare(strict_types=1);
namespace Jfcherng\Diff\Renderer\Html;
use Jfcherng\Diff\SequenceMatcher;
use Jfcherng\Diff\Renderer\RendererConstant;
/**
* Combined HTML diff generator
*/
final class Combined extends AbstractHtml
{
/**
* {@inheritdoc}
*/
const INFO = [
'desc' => 'Combined',
'type' => 'Html',
];
/**
* {@inheritdoc}
*/
protected function redererChanges(array $changes): string
{
if (empty($changes)) {
return $this->getResultForIdenticals();
}
$wrapperClasses = \array_merge(
$this->options['wrapperClasses'],
['diff', 'diff-html', 'diff-combined']
);
$html = '<table class="' . \implode(' ', $wrapperClasses) . '">';
$html .= $this->renderTableHeader();
foreach ($changes as $i => $blocks) {
if ($i > 0 && $this->options['separateBlock']) {
$html .= $this->renderTableSeparateBlock();
}
foreach ($blocks as $change) {
$html .= $this->renderTableBlock($change);
}
}
return $html . '</table>';
}
/**
* Renderer the table header.
*/
protected function renderTableHeader(): string
{
return
'<thead>' .
'<tr>' .
(
$this->options['lineNumbers']
?
'<th>' . $this->_('old_version') . '</th>' .
'<th>' . $this->_('new_version') . '</th>'
:
''
) .
'<th>' . $this->_('differences') . '</th>' .
'</tr>' .
'</thead>';
}
/**
* Renderer the table separate block.
*/
protected function renderTableSeparateBlock(): string
{
$colspan = $this->options['lineNumbers'] ? '3' : '';
return
'<tbody class="skipped">' .
'<tr>' .
'<td' . $colspan . '></td>' .
'</tr>' .
'</tbody>';
}
/**
* Renderer the table block.
*
* @param array $change the change
*/
protected function renderTableBlock(array $change): string
{
static $callbacks = [
SequenceMatcher::OP_EQ => 'renderTableEqual',
SequenceMatcher::OP_INS => 'renderTableInsert',
SequenceMatcher::OP_DEL => 'renderTableDelete',
SequenceMatcher::OP_REP => 'renderTableReplace',
];
return
'<tbody class="change change-' . self::TAG_CLASS_MAP[$change['tag']] . '">' .
$this->{$callbacks[$change['tag']]}($change) .
'</tbody>';
}
/**
* Renderer the table block: equal.
*
* @param array $change the change
*/
protected function renderTableEqual(array $change): string
{
$html = '';
// note that although we are in a OP_EQ situation,
// the old and the new may not be exactly the same
// because of ignoreCase, ignoreWhitespace, etc
foreach ($change['old']['lines'] as $no => $oldLine) {
// hmm... but this is a inline renderer
// we could only pick a line from the old or the new to show
$oldLineNum = $change['old']['offset'] + $no + 1;
$newLineNum = $change['new']['offset'] + $no + 1;
$html .=
'<tr data-type="=">' .
$this->renderLineNumberColumns($oldLineNum, $newLineNum) .
'<td class="old">' . $oldLine . '</td>' .
'</tr>';
}
return $html;
}
/**
* Renderer the table block: insert.
*
* @param array $change the change
*/
protected function renderTableInsert(array $change): string
{
$html = '';
foreach ($change['new']['lines'] as $no => $newLine) {
$newLineNum = $change['new']['offset'] + $no + 1;
$html .=
'<tr data-type="+">' .
$this->renderLineNumberColumns(null, $newLineNum) .
'<td class="new">' . $newLine . '</td>' .
'</tr>';
}
return $html;
}
/**
* Renderer the table block: delete.
*
* @param array $change the change
*/
protected function renderTableDelete(array $change): string
{
$html = '';
foreach ($change['old']['lines'] as $no => $oldLine) {
$oldLineNum = $change['old']['offset'] + $no + 1;
$html .=
'<tr data-type="-">' .
$this->renderLineNumberColumns($oldLineNum, null) .
'<td class="old">' . $oldLine . '</td>' .
'</tr>';
}
return $html;
}
/**
* Renderer the table block: replace.
*
* @param array $change the change
*/
protected function renderTableReplace(array $change): string
{
$html = '';
if (\count($change['old']['lines']) >= \count($change['new']['lines'])) {
foreach ($change['old']['lines'] as $no => $oldLine) {
$oldLineNum = $change['old']['offset'] + $no + 1;
if (isset($change['new']['lines'][$no])) {
$newLineNum = $change['old']['offset'] + $no + 1;
$newLine = '<span>' . $change['new']['lines'][$no] . '</span>';
} else {
$newLineNum = null;
$newLine = '';
}
$html .=
'<tr>' .
$this->renderLineNumberColumns($oldLineNum, $newLineNum) .
'<td class="rep"><span>' .
$this->mergeDiffs($newLine, $oldLine) .
'</span></td>' .
'</tr>';
}
} else {
foreach ($change['new']['lines'] as $no => $newLine) {
$newLineNum = $change['new']['offset'] + $no + 1;
if (isset($change['old']['lines'][$no])) {
$oldLineNum = $change['old']['offset'] + $no + 1;
$oldLine = '<span>' . $change['old']['lines'][$no] . '</span>';
} else {
$oldLineNum = null;
$oldLine = '';
}
$html .=
'<tr>' .
$this->renderLineNumberColumns($oldLineNum, $newLineNum) .
'<td class="rep"><span>' .
$this->mergeDiffs($newLine, $oldLine) .
'</span></td>' .
'</tr>';
}
}
return $html;
}
/**
* Renderer the line number columns.
*
* @param null|int $oldLineNum The old line number
* @param null|int $newLineNum The new line number
*/
protected function renderLineNumberColumns(?int $oldLineNum, ?int $newLineNum): string
{
if (!$this->options['lineNumbers']) {
return '';
}
return
(
isset($oldLineNum)
? '<th class="n-old">' . $oldLineNum . '</th>'
: '<th></th>'
) .
(
isset($newLineNum)
? '<th class="n-new">' . $newLineNum . '</th>'
: '<th></th>'
);
}
/**
* Merge diffs between lines.
*
* Gets newPart in <ins>newPart</ins>
* Replaces <del>oldPart</del> with
* <del>oldPart</del><ins>newPart</ins>
*
* @param string $newLine New line
* @param string $oldLine Old line
*/
protected function mergeDiffs(string $newLine, string $oldLine): string
{
$newParts = $this->getNewPartsWithClosures(
RendererConstant::HTML_CLOSURES_INS[0],
RendererConstant::HTML_CLOSURES_INS[1],
$newLine
);
$offset = 0;
return preg_replace_callback(
'/' . preg_quote(RendererConstant::HTML_CLOSURES_DEL[1], '/') . '/',
function($match) use($newParts, &$offset)
{
return $newParts[$offset++];
}, $oldLine
);
}
/**
* Get new parts of line with </del><ins></ins> closures.
* This one is adapted from:
* https://stackoverflow.com/a/27078384/12866913
*
* @param string $line Line
* @param string $leftDelim Left delimiter
* @param string $rightDelim Right delimiter
*/
protected function getNewPartsWithClosures(
string $leftDelim,
string $rightDelim,
string $line
): array
{
$contents = [];
$leftDelimLength = \strlen($leftDelim);
$rightDelimLength = \strlen($rightDelim);
$startFrom = $contentStart = $contentEnd = 0;
while (($contentStart = \strpos($line, $leftDelim, $startFrom)) !== false) {
$contentStart += $leftDelimLength;
$contentEnd = \strpos($line, $rightDelim, $contentStart);
if ($contentEnd === false) {
break;
}
$contents[] =
RendererConstant::HTML_CLOSURES_DEL[1] .
RendererConstant::HTML_CLOSURES_INS[0] .
\substr(
$line,
$contentStart,
$contentEnd - $contentStart
) .
RendererConstant::HTML_CLOSURES_INS[1];
$startFrom = $contentEnd + $rightDelimLength;
}
return $contents;
}
}
NEEDS FURTHER TESTING FOR DIFFERENT LEVELS. IT STILL HAVE BUGS
this one is introducing new class name for td
tags: rep
(representing combined lines). It looks okay with these styles:
.diff-wrapper.diff .rep {
background: #fce7a0;
}
.diff-wrapper.diff .rep del {
text-decoration: line-through !important;
background: #f7c9c9;
}
.diff-wrapper.diff .rep ins {
background: #bef6be;
}
here are the screenshots from my side:
That's a good start. :+1:
Here are some things that I notice currently:
detailLevel
is "none"
. Hmm... What should be the expected behavior then? This situation seems to make no sense... Maybe disallow this renderer to work with detailLevel => none
?rep
block has different line counts for old
and new
(i.e., https://github.com/jfcherng/php-diff/issues/20#issuecomment-583833428). What should be the expected behavior then (especially when lineNumber => true
)?Also, maybe create a PR for easier code review and test.
Hey, it's me again. Could you try to render the diff between these two using Combined
(word-level)?
Old:
https://paste.ubuntu.com/p/8yZQpH9h3W/
New:
TESTING IT AS MUCH AS I CAN
Output look like a mess.
Also, is the output okay with the other renderers? It shows old lines as new lines to me. I think i broke it đ«đ«
yeah it produces a bad-looking result. When there are too many differences, force merging them together is no longer meaningful.
Here's a naive patch https://github.com/jfcherng/php-diff/commit/273d1b43ca67d239f4da36af78253afcaf9f17c4. I am not sure about the proper ratio. 0.8
is just a random pick at this moment.
you're as fast as lightning! It fixed. 0.8
looks reasonable for now.
Damn, you're a genius.
Off topic... This page is useful to generate diffs from a random text: https://csgenerator.com/
Off topic... This page is useful to generate diffs from a random text: https://csgenerator.com/
Interesting! But the sentence is too short I guess :smile:
Hey @jfcherng, could you test this out with word-level and 0.8 threshold? What's wrong with that dot?
Hey @jfcherng, could you test this out with word-level and 0.8 threshold? What's wrong with that dot?
Old New
I can't see the problem. Maybe a screenshot?
Or maybe the text you provided cannot be copied accurately since they are not in code block.
Since you have release the-turk/flarum-diff
1.0.0
, I assume the issue no longer exists?
Since you have release
the-turk/flarum-diff
1.0.0
, I assume the issue no longer exists?
Ahh, sorry I forgot this. Seems like I can't reproduce it either, I dropped my custom renderers maybe this fixed the issue. Thank you!
@jfcherng sorry for the ping, are you using Discord?
yes, ---------
What does it take to write a renderer to show diffs as in GitHub (see my edit history)? Currently it duplicates the text and doubles the storage volume when storing a JSON output into the database. We might need two renderers for this kind of functionality (one is JSON and the other one is HTML).