petdance / tidyp

tidyp, a fork of the original tidy
36 stars 8 forks source link

tidyp removes whitespace between end tag and text #21

Open shlomif opened 7 years ago

shlomif commented 7 years ago

With this i.html:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US">
<head>
<title>Test page</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
<body>
<p>
Style and look based on the
<a href="http://wordpress.org/extend/themes/smoked">Smoked WordPress Theme</a>
by <a href="http://wordpress.org/extend/themes/profile/iconstantin">iconstantin</a>.
</p>
</body>
</html>

And this perl program:

#!/usr/bin/perl

use strict;
use warnings;

use HTML::Tidy ();

my $tidy = HTML::Tidy->new(
    {
        'input_xml'     => 1,
        'output_xml'    => 1,
        'char_encoding' => 'utf8',
    }
);

local $/;
print $tidy->clean(scalar <>);

I am getting this output:

shlomif[homepage]:$trunk$ perl p.pl < i.html
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US">
<head>
<title>Test page</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
<body>
<p>Style and look based on the
<a href="http://wordpress.org/extend/themes/smoked">Smoked WordPress Theme</a>by 
<a href="http://wordpress.org/extend/themes/profile/iconstantin">iconstantin</a>.</p>
</body>
</html>
shlomif[homepage]:$trunk$ ls

As can be noticed the </a>by is faulty. Please look into fixing it.

Update: I am using lib64tidyp1.04_0-1.04-9.mga6 and perl-HTML-Tidy-1.560.0-9.mga6 on Mageia 6.

petdance commented 7 years ago

tidyp is just a clone of libtidy. I don't plan on making any changes to it. libtidy has pretty much been ignored for years.

shlomif commented 7 years ago

Hi Andy!

Thanks for the quick reply.

On Mon, 27 Mar 2017 10:44:03 -0700 Andy Lester notifications@github.com wrote:

tidyp is just a clone of libtidy. I don't plan on making any changes to it. libtidy has pretty much been ignored for years.

Did you mean "clone" or "fork"? "clone" implies it is a reimplemntation while a "fork" implies the latest source code was maintained by a third party.

Anyway:

  1. I might be able to try to fix this bug and others that bother other people. Will you accept me as a comaintainer?

  2. Are you aware of any comparable alternatives to tidyp that are in a better shape and which I can use instead?

Best regards,

Shlomi

--

Shlomi Fish http://www.shlomifish.org/ My Photos - http://www.flickr.com/photos/shlomif/

There is no IGLU Cabal. The problem of founding an IGLU Cabal has been proven, in a surprise move, to be equivalent to the question of the existence of God, fully‐tolerant religions and NP‐complete oracles. — Omer Zak

petdance commented 7 years ago

I meant a clone. I just copied the repo and put a version number on it so that HTML::Tidy would have something to build against.

There are no other alternatives that I am aware of.

If you can fix the bug you're referring to, let's talk. You say that bugs "bother other people". Do you know of anyone besides you using it?

shlomif commented 7 years ago

Hi,

On Mon, 27 Mar 2017 12:14:25 -0700 Andy Lester notifications@github.com wrote:

I meant a clone. I just copied the repo and put a version number on it so that HTML::Tidy would have something to build against.

I see.

There are no other alternatives that I am aware of.

I see. Guess I'm stuck with tidyp/HTML::Tidy for now.

If you can fix the bug you're referring to, let's talk.

OK, no promises but I'll try.

You say that bugs "bother other people". Do you know of anyone besides you using it?

I was referring the the open issues in the issue tracker. I don't know of anyone else using it.

-- @shlomif

--

Shlomi Fish http://www.shlomifish.org/

Ever notice that even the busiest people are never too busy to tell you just how busy they are? — Source unknown, taken from Linux’s fortune-mod

shlomif commented 7 years ago

@petdance (and all): just a note that using this DDG search - https://duckduckgo.com/?q=html+minifier&ia=web I learned about this project - https://github.com/kangax/html-minifier - which I adapted to use for minification instead of tidyp (I'm still using tidyp for validation). Hope it helps.