htacg / tidy-html5

The granddaddy of HTML tools, with support for modern standards
http://www.html-tidy.org
2.71k stars 418 forks source link

Tidy 5.6.0 on Mac says Not a file when file is not writeable #681

Closed thvv closed 3 years ago

thvv commented 6 years ago

I check files before install with tidy --markup no --quiet yes --show-warnings no index.html I get the error Document: "index.html" is not a file! well, it certainly is. But it is mode 444. Which is fine because I am not updating it, just checking it. Changing the file mode to 666 makes it work.

I installed this version from macports tidy-5.6.0_0.darwin_16.x86_64.tbz2

geoffmcl commented 6 years ago

@thvv wow what an interesting discovery...

It seems it is not only on the MAC, but is the same in unix/linux, and Windows, using current tidy 5.7.3...

I just set a file to 444 - $ chmod 0444 temp.html, and then $ tidy temp.html and I get Document: "temp.html" is not a file! - Zute, or as Snagglepuss would say Heavens to Murgatroyd!!!

It certainly seems at least one write bit must be set, like $ chmod 0644, and tidy will be ok...

On the face of it this does not seem right. Provided you are not using the -modify option, which will try to overwite the file, it seems only read permission should be required...

AH HA! The answer is simple. The tidyDocParseFile service has the following code -

    int status = -ENOENT;
    FILE* fin = fopen( filnam, "r+" );

    if ( !fin )
    {
        TY_(ReportFileError)( doc, filnam, FILE_NOT_FILE );
        return status;
    }

    fclose( fin );

Now reading some fopen references that access string "r+" means **read/update**: Open a file for update (both for input and output). The file must exist.

And note that if I set the file read only in Windows, > attrib +R temp.html, tidy will fail at the same place...

Now after that code block, WIN32 and UNIX separate...

UNIX continues with fin = fopen( filnam, "rb" );... you will note the + has been removed...

Windows goes another code path, using in essence -

    HANDLE fp = CreateFileA( filnam, GENERIC_READ, FILE_SHARE_READ, NULL,
                              OPEN_EXISTING, 0, NULL );
    ...
    fin->map = CreateFileMapping( fp, NULL, PAGE_READONLY, 0, 0, NULL );
    ...
    data->view = MapViewOfFile( data->map, FILE_MAP_READ,
                                (DWORD)( data->pos >> 32 ),
                                (DWORD)data->pos, numb );

So we note also in Windows, that the concept of + is not required, with GENERIC_READ, PAGE_READONLY, and FILE_MAP_READ... this of course needs to be verified...

Reading this far, maybe, just maybe! there is a good case for removing that initial + also... but as I review the history, maybe NOT...

It should be noted that the above fopen( filnam, "r+" ); block was added relatively recently.. I trace it back to commit ce105dcf09, May 7, 2017... It was to address #391 - the fact that a directory would be treated as a file in unix/MAC...

And IIRC a fopen( dirname, "r" ); will succeed in unix - need to check this...

Maybe the whole block could be replaced with a crossported stat check, which would also yield the size and time, which would make some of the subsequent code both in windows and unix redundant...

I have been using the following, well tested, crossported service, in many projects, for a long time -

#ifdef WIN32
#define M_IS_DIR _S_IFDIR
#else // !WIN32
#define M_IS_DIR S_IFDIR
#endif
static struct stat buf;
enum DiskType {
    MDT_NONE,
    MDT_FILE,
    MDT_DIR
};

static DiskType is_file_or_directory(const char * path)
{
    if (!path)
        return MDT_NONE;
    if (stat(path, &buf) == 0)
    {
        if (buf.st_mode & M_IS_DIR)
            return MDT_DIR;
        else
            return MDT_FILE;
    }
    return MDT_NONE;
}
static size_t get_last_file_size() { return buf.st_size; }
static time_t get_last_file_time() { return buf.st_mtime; }

It seems this is a bug. Tidy should NOT have a problem with a read only files.

If the -modify option is used, then this problem should be handled at the output stage...

In any case tidy should not report is not a file! when it is a file...

What do others think... thanks...

geoffmcl commented 6 years ago

@thvv, just to confirm one small test done of fopen( filnam, "r" );, that is without the +, succeeds in unix, and probably in OSX, even if the input is a directory, so can not just remove it...

Am tending toward using a 'stat' test and also fail if a directory, but would need another message like say FILE_IS_DIRECTORY, "%s is a directory!" so tidy can correctly report the problem...

As stated look forward to further feedback... thanks...

i-give-up commented 6 years ago

Yes, encountered the same "no file" error in Windows 8.1 when the html file was opened in gVim (which I suppose locks the file from being written by another program).

I have no idea about the code but yes it shouldn't fail if the file is read-only.

geoffmcl commented 5 years ago

@hosiet repeated this bug in #789 ... any help appreciated... thanks...

jidanni commented 5 years ago

[Seen on 5.6. I wrote this offline before connecting to search for already filed bugs.]

Tidy can't deal with readonly files even though we are not requesting overwrite

$ tidy -eq  /usr/share/doc/aptitude/html/en/ch02s04s05.html
Document: "/usr/share/doc/aptitude/html/en/ch02s04s05.html" is not a file!

First bug: It is. So don't say that. Proof:

$ tidy -eq < /usr/share/doc/aptitude/html/en/ch02s04s05.html
line 676 column 39 - Warning: trimming empty <p>

So what's the problem?

$ strace tidy -q /usr/share/doc/aptitude/html/en/ch02s04s05.html 2>&1
openat(AT_FDCWD, "/usr/share/doc/aptitude/html/en/ch02s04s05.html", O_RDWR) = -1 EACCES (Permission denied)

This is crazy.

$ cp /usr/share/doc/aptitude/html/en/ch02s04s05.html /tmp/g.html
$ strace tidy -q /tmp/g.html 2>&1
openat(AT_FDCWD, "/tmp/g.html", O_RDWR) = 3
close(3)                                = 0
openat(AT_FDCWD, "/tmp/g.html", O_RDONLY) = 3

Second bug: I am not asking tidy to overwrite the file. So don't use that test in such cases!

Else one cannot pretty print any readonly file, no matter to STDOUT, a pipe, or another file.

geoffmcl commented 5 years ago

@i-give-up, do not know about gVim in Windows, but have encountered other editors, that use some form of file locking, sometimes by just keeping the file open, but have not tested how tidy would react in that circumstance, especially that libTidy uses file mapping... that would need to be explored...

And @jidanni thank you for repeating this known bug...

In discussion, the code problem has been identified, and code solutions offered, at least for the read-only files...

Just waiting for someone to step up with a patch, or PR, to fix this crazy bug... thanks...

geoffmcl commented 4 years ago

20201007:

@thvv, @hosiet, all, any, still waiting for someone to step up with a patch, PR, to fix this crazy bug...

The discussion above offers solutions... need help... hopefully more that a +1... thanks...

geoffmcl commented 3 years ago

@thvv, @hosiet - #789, @i-give-up, @jidanni, @jmadiot, @d0ugm - #918, etc, have got around to testing the following patch...

diff --git a/src/tidylib.c b/src/tidylib.c
index 75cb1d9..4d57510 100644
--- a/src/tidylib.c
+++ b/src/tidylib.c
@@ -1125,19 +1125,26 @@ int TIDY_CALL  tidyParseSource( TidyDoc tdoc, TidyInputSource* source )
     return tidyDocParseSource( doc, source );
 }

-
+#ifdef WIN32
+#define M_IS_DIR _S_IFDIR
+#else // !WIN32
+#define M_IS_DIR S_IFDIR
+#endif
 int   tidyDocParseFile( TidyDocImpl* doc, ctmbstr filnam )
 {
     int status = -ENOENT;
-    FILE* fin = fopen( filnam, "r+" );
-
-    if ( !fin )
+    FILE* fin = 0;
+    struct stat sbuf = { 0 }; /* Is. #681 - read-only files */
+    if ( stat(filnam,&sbuf) != 0 )
     {
         TY_(ReportFileError)( doc, filnam, FILE_NOT_FILE );
         return status;
     }
-
-    fclose( fin );
+    if (sbuf.st_mode & M_IS_DIR) /* and /NOT/ if a DIRECTORY */
+    {
+        TY_(ReportFileError)(doc, filnam, FILE_NOT_FILE);
+        return status;
+    }

 #ifdef _WIN32
     return TY_(DocParseFileWithMappedFile)( doc, filnam );
@@ -1147,7 +1154,6 @@ int   tidyDocParseFile( TidyDocImpl* doc, ctmbstr filnam )

 #if PRESERVE_FILE_TIMES
     {
-        struct stat sbuf = { 0 };
         /* get last modified time */
         TidyClearMemory(&doc->filetimes, sizeof(doc->filetimes));
         if (fin && cfgBool(doc, TidyKeepFileTimes) &&

At present only tested it in Windows 10, and it seems to work ok...

As time permits, will get around the creating an issue-681 branch, or the like, so it can be easily pulled, checked out, and test in all OSes, distros, etc... especially the MAC...

Meantime, would really appreciate others applying the patch, and testing it in your local system, and reporting, so we can get rid of this silly BUG... thanks...

geoffmcl commented 3 years ago

@thvv, @hosiet - #789, @i-give-up, @jidanni, @jmadiot, @d0ugm - #918, etc, have now pushed branch issue-681 for testing...

$ cd tidy-html5 # get to source
$ git pull # update
$ git checkout issue-681 # switch to branch
$ cd build/cmake # get to build folder
$ build and test

Look forward to feedback... thanks...

relphj commented 3 years ago

issue-681 seems to fix the problem for me on Ubuntu 20.04.

geoffmcl commented 3 years ago

@relphj, thank you for the positive feedback on Ubuntu 20.04.

In addition to Windows 10 testing, I have also now tested the issue-681 branch, on 1. Ubuntu 18.04, (must get around to upgrading this!), and 2. my RPI ARM 4.1.19 (Rasbian 8.0), and it works fine... for both read-only, chmod 0444 temp1.html, files, and any directory...

And, of course it fails in all, if I add the -m - modify current file - option... with output Document: Can't open "temp1.html"...

If the input html is otherwise ok, related to issue #921, am very disturbed that the exit value can be 0! This does not seem right... but that is a separate issue...

Anyway, @thvv, hope you, or others, can confirm it is OK for the MAC, where the this issue started...

And then maybe this can be merged to next... thanks...

geoffmcl commented 3 years ago

Added PR #926 for testing... look forward to reports/comment/feedback... thanks...

geoffmcl commented 3 years ago

@jmadiot, thank for the linux testing, and reporting...

I guess you missed that I prefer the reporting be back in this open issue... but no probs... I copied your script here...

mkdir /tmp/testtidy
cd /tmp/testtidy
git clone https://github.com/htacg/tidy-html5.git
cd tidy-html5/
git switch issue-681
cd build/cmake/
cmake ../.. -DCMAKE_BUILD_TYPE=Release
make
echo "foo" > foo.html
./tidy foo.html
# correct output
chmod -w foo.html
./tidy foo.html
# also correct output!
# I also check that the branch next had the problem:
git switch next
make
./tidy foo.html
# expectedly got the error of issue 681: Document: "foo.html" is not a file!
chmod +w foo.html
./tidy foo.html
# normal output again

Hope we get a report from a MAC user soonest, using the above script @jmadiot has provided - @thvv?, or others?...

And the PR #926 can be merged... thanks...

thvv commented 3 years ago

Hello Geoff etc.

I tried your script on my Mac running macOS Mojave 10.14.6. It failed because 'git switch' gave an error. the mac version of git is "git version 2.20.1 (Apple Git-117)" installed with xcode for Mojave. 'tidy' works and produces correct output, but fails if the file is not writeable.

I tried the script again, replacing 'git switch issue-681' with 'git checkout issue-681' and it worked.

[thvv@Uly2 tmp]$ mkdir testtidy [thvv@Uly2 tmp]$ git clone https://github.com/htacg/tidy-html5.git Cloning into 'tidy-html5'... remote: Enumerating objects: 21, done. remote: Counting objects: 100% (21/21), done. remote: Compressing objects: 100% (19/19), done. remote: Total 9306 (delta 6), reused 9 (delta 2), pack-reused 9285 Receiving objects: 100% (9306/9306), 5.50 MiB | 3.38 MiB/s, done. Resolving deltas: 100% (5831/5831), done. [thvv@Uly2 tmp]$ ls MozillaUpdateLock-2656FF1E876E9973 com.google.Keystone/ testtidy/ adobegc.log mysql.sock= tidy-html5/ com.apple.launchd.Bo9Q0853Ws/ mysqlx.sock= com.apple.launchd.DVcLsjRnir/ powerlog/ [thvv@Uly2 tmp]$ ls .git ls: .git: No such file or directory [thvv@Uly2 tmp]$ rm tidy-html5 rm: tidy-html5: is a directory [thvv@Uly2 tmp]$ rm -rf tidy-html5 [thvv@Uly2 tmp]$ cd testtidy [thvv@Uly2 testtidy]$ git clone https://github.com/htacg/tidy-html5.git Cloning into 'tidy-html5'... remote: Enumerating objects: 21, done. remote: Counting objects: 100% (21/21), done. remote: Compressing objects: 100% (19/19), done. remote: Total 9306 (delta 6), reused 9 (delta 2), pack-reused 9285 Receiving objects: 100% (9306/9306), 5.50 MiB | 3.40 MiB/s, done. Resolving deltas: 100% (5831/5831), done. [thvv@Uly2 testtidy]$ cd tidy-html5/ [thvv@Uly2 tidy-html5]$ git checkout issue-681 Branch 'issue-681' set up to track remote branch 'issue-681' from 'origin'. Switched to a new branch 'issue-681' [thvv@Uly2 tidy-html5]$ cd build/cmake [thvv@Uly2 cmake]$ cmake ../.. -DCMAKE_BUILD_TYPE=Release -- The C compiler identification is AppleClang 10.0.1.10010046 -- The CXX compiler identification is AppleClang 10.0.1.10010046 -- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc -- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- Debug Logging is NOT enabled. -- Building support for runtime configuration files. -- Also building DLL library SHARED, version 5.7.45, date 2020.11.30 -- Generating man tidy.1 custom commands... -- Configuring done -- Generating done -- Build files have been written to: /tmp/testtidy/tidy-html5/build/cmake [thvv@Uly2 cmake]$ make Scanning dependencies of target tidy-static [ 1%] Building C object CMakeFiles/tidy-static.dir/src/access.c.o [ 3%] Building C object CMakeFiles/tidy-static.dir/src/attrs.c.o [ 5%] Building C object CMakeFiles/tidy-static.dir/src/istack.c.o [ 7%] Building C object CMakeFiles/tidy-static.dir/src/parser.c.o [ 8%] Building C object CMakeFiles/tidy-static.dir/src/tags.c.o [ 10%] Building C object CMakeFiles/tidy-static.dir/src/entities.c.o [ 12%] Building C object CMakeFiles/tidy-static.dir/src/lexer.c.o [ 14%] Building C object CMakeFiles/tidy-static.dir/src/pprint.c.o [ 16%] Building C object CMakeFiles/tidy-static.dir/src/charsets.c.o [ 17%] Building C object CMakeFiles/tidy-static.dir/src/clean.c.o [ 19%] Building C object CMakeFiles/tidy-static.dir/src/message.c.o [ 21%] Building C object CMakeFiles/tidy-static.dir/src/config.c.o [ 23%] Building C object CMakeFiles/tidy-static.dir/src/alloc.c.o [ 25%] Building C object CMakeFiles/tidy-static.dir/src/attrdict.c.o [ 26%] Building C object CMakeFiles/tidy-static.dir/src/buffio.c.o [ 28%] Building C object CMakeFiles/tidy-static.dir/src/fileio.c.o [ 30%] Building C object CMakeFiles/tidy-static.dir/src/streamio.c.o [ 32%] Building C object CMakeFiles/tidy-static.dir/src/tagask.c.o [ 33%] Building C object CMakeFiles/tidy-static.dir/src/tmbstr.c.o [ 35%] Building C object CMakeFiles/tidy-static.dir/src/utf8.c.o [ 37%] Building C object CMakeFiles/tidy-static.dir/src/tidylib.c.o [ 39%] Building C object CMakeFiles/tidy-static.dir/src/mappedio.c.o [ 41%] Building C object CMakeFiles/tidy-static.dir/src/gdoc.c.o [ 42%] Building C object CMakeFiles/tidy-static.dir/src/language.c.o [ 44%] Building C object CMakeFiles/tidy-static.dir/src/messageobj.c.o [ 46%] Building C object CMakeFiles/tidy-static.dir/src/sprtf.c.o [ 48%] Linking C static library libtidys.a /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib: file: libtidys.a(sprtf.c.o) has no symbols /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib: file: libtidys.a(sprtf.c.o) has no symbols [ 48%] Built target tidy-static Scanning dependencies of target tidy [ 50%] Building C object CMakeFiles/tidy.dir/console/tidy.c.o [ 51%] Linking C executable tidy [ 51%] Built target tidy Scanning dependencies of target man Generate /tmp/testtidy/tidy-html5/build/cmake/tidy-help.xml Generate /tmp/testtidy/tidy-html5/build/cmake/tidy-config.xml Generate tidy.1 [ 51%] Built target man Scanning dependencies of target tidy-share [ 53%] Building C object CMakeFiles/tidy-share.dir/src/access.c.o [ 55%] Building C object CMakeFiles/tidy-share.dir/src/attrs.c.o [ 57%] Building C object CMakeFiles/tidy-share.dir/src/istack.c.o [ 58%] Building C object CMakeFiles/tidy-share.dir/src/parser.c.o [ 60%] Building C object CMakeFiles/tidy-share.dir/src/tags.c.o [ 62%] Building C object CMakeFiles/tidy-share.dir/src/entities.c.o [ 64%] Building C object CMakeFiles/tidy-share.dir/src/lexer.c.o [ 66%] Building C object CMakeFiles/tidy-share.dir/src/pprint.c.o [ 67%] Building C object CMakeFiles/tidy-share.dir/src/charsets.c.o [ 69%] Building C object CMakeFiles/tidy-share.dir/src/clean.c.o [ 71%] Building C object CMakeFiles/tidy-share.dir/src/message.c.o [ 73%] Building C object CMakeFiles/tidy-share.dir/src/config.c.o [ 75%] Building C object CMakeFiles/tidy-share.dir/src/alloc.c.o [ 76%] Building C object CMakeFiles/tidy-share.dir/src/attrdict.c.o [ 78%] Building C object CMakeFiles/tidy-share.dir/src/buffio.c.o [ 80%] Building C object CMakeFiles/tidy-share.dir/src/fileio.c.o [ 82%] Building C object CMakeFiles/tidy-share.dir/src/streamio.c.o [ 83%] Building C object CMakeFiles/tidy-share.dir/src/tagask.c.o [ 85%] Building C object CMakeFiles/tidy-share.dir/src/tmbstr.c.o [ 87%] Building C object CMakeFiles/tidy-share.dir/src/utf8.c.o [ 89%] Building C object CMakeFiles/tidy-share.dir/src/tidylib.c.o [ 91%] Building C object CMakeFiles/tidy-share.dir/src/mappedio.c.o [ 92%] Building C object CMakeFiles/tidy-share.dir/src/gdoc.c.o [ 94%] Building C object CMakeFiles/tidy-share.dir/src/language.c.o [ 96%] Building C object CMakeFiles/tidy-share.dir/src/messageobj.c.o [ 98%] Building C object CMakeFiles/tidy-share.dir/src/sprtf.c.o [100%] Linking C shared library libtidy.dylib [100%] Built target tidy-share [thvv@Uly2 cmake]$ echo "foo" > foo.html [thvv@Uly2 cmake]$ ./tidy foo.html line 1 column 1 - Warning: missing <!DOCTYPE> declaration line 1 column 1 - Warning: plain text isn't allowed in elements line 1 column 1 - Info: previously mentioned line 1 column 1 - Warning: inserting implicit line 1 column 1 - Warning: inserting missing 'title' element Info: Document content looks like HTML5 Tidy found 4 warnings and 0 errors!

<!DOCTYPE html>

foo

About HTML Tidy: https://github.com/htacg/tidy-html5 Bug reports and comments: https://github.com/htacg/tidy-html5/issues Official mailing list: https://lists.w3.org/Archives/Public/public-htacg/ Latest HTML specification: http://dev.w3.org/html5/spec-author-view/ Validate your HTML documents: http://validator.w3.org/nu/ Lobby your company to join the W3C: http://www.w3.org/Consortium

Do you speak a language other than English, or a different variant of English? Consider helping us to localize HTML Tidy. For details please see https://github.com/htacg/tidy-html5/blob/master/README/LOCALIZE.md [thvv@Uly2 cmake]$ chmod -w foo.html [thvv@Uly2 cmake]$ ./tidy foo.html line 1 column 1 - Warning: missing <!DOCTYPE> declaration line 1 column 1 - Warning: plain text isn't allowed in elements line 1 column 1 - Info: previously mentioned line 1 column 1 - Warning: inserting implicit line 1 column 1 - Warning: inserting missing 'title' element Info: Document content looks like HTML5 Tidy found 4 warnings and 0 errors!

<!DOCTYPE html>

foo

About HTML Tidy: https://github.com/htacg/tidy-html5 Bug reports and comments: https://github.com/htacg/tidy-html5/issues Official mailing list: https://lists.w3.org/Archives/Public/public-htacg/ Latest HTML specification: http://dev.w3.org/html5/spec-author-view/ Validate your HTML documents: http://validator.w3.org/nu/ Lobby your company to join the W3C: http://www.w3.org/Consortium

Do you speak a language other than English, or a different variant of English? Consider helping us to localize HTML Tidy. For details please see https://github.com/htacg/tidy-html5/blob/master/README/LOCALIZE.md [thvv@Uly2 cmake]$

geoffmcl commented 3 years ago

@thvv, thank you for the lengthy report...

Yes, I too noted the git switch vs checkout, that I use mostly... depends on the git version... switch is also not in my git 2.17.0.window.1... maybe you can upgrade your git installation...

You did seem to miss the final steps - checking out next, rebuild, and ./tidy foo.html should fail - but since you started the issue using tidy-5.6.0, I guess that is ok...

Of course you now have the latest tidy 5.7.45.I681 available... lucky you ;=))

I do not know about the mac, but you may only need $ sudo make install, to make it the active tidy in your system, but unsure about the mac, since do not have one... maybe others could help with that...

And maybe you could now offer further help with other PR/patch testing, and reporting... thanks...

Hopefully macports can offer a later tidy version, also... maybe file a report...

But we now have positive reports from the 3 major OS'es, and no negatives, so will get to merging #926 soonest... thanks...