martinrotter / textosaurus

Cross-platform text editor based on Qt and Scintilla.
GNU General Public License v3.0
284 stars 44 forks source link

The self-recognition of the format (Language) is it presently designed to be based on the filename extension string solely? #69

Closed paoloschi closed 5 years ago

paoloschi commented 5 years ago

Brief description of the issue.

I'm a Linux O.S. user. It is well known that *UNIX systems only as a last resort looks at the file extension string; the primary method of recognition leans on the 'Magic numbers' (metadata not visible) as on conventional header given in the first line of the text document (this one the human readable way) for which a text file containing the string <?xml[ ...]> in the first line will be identified as an XML document regardless of its filename as well as for scripting it is common to apply a shebang as the first code line and that will be enough to have a script correctly interpreted whatever his filename may be....

As far as I can see, Textosaurus 0.9.11 seems to me to be based solely on the file name extension string to automatically identify the format of the document I am opening. Is my guess correct? If it is, I think it's a very serious limitation...

How to reproduce the bug?

  1. With Textosaurus 0.9.11 I create a new document. Through the 'Language' menu I select 'B/Bash' to set the type of document I want to create. I type the code:
    #!/usr/bin/bash
    echo "Hello World!"
  2. Now I'm saving the document: the name I choose for my script file is simply test, without any extension (to be even more clear: without the conventional .sh suffix) I can do that! This is completely normal behavior under my Linux operating system (...).

What is the expected result?

At each subsequent opening of my test file, I expect Textosaurus to automatically recognize the format (language) through the first line of code I had inserted in the contents (the #!/usr/bin/bash shebang), thus also activating the correct syntax highlighting for shell scripting. Now if I open the 'Language' menu I expect to see 'B/Bash' item was automatically selected.

This is in fact the normal behavior of any multi-format editor under Linux. Moreover, if I had neglected to manually set the format before typing the same contents in a new document (thus editing my Bash code in 'Plain text' mode), immediately after the first writing ('Save file as...') the normal behavior of a multi-format editor in Linux is the automatic switching of the active format with consequent immediate application of the syntax highlighting even before I close the document.

What actually happened?

Every time I open my test file in Textosaurus 0.9.11, the automatically recognized format I see is 'Plain text'. For the correct interpretation of the file content I will have to manually provide myself every time selecting the item 'B/Bash' from the menu 'Language'.

Other information (logs, see Wiki)

I also searched for this topics in the TODO list and found no evidence of future development planning in this regard ...or is just me that I wasn't pretty able to find them?

martinrotter commented 5 years ago

At this point of time, file type "recognition" is based solely in file extension, your are thus correct. This feature your describe was not planned, because I did not even thought about adding this.

That said. I will try to add this. Why not. I am power Arch Linux user myself and would probably make use of this. The only slight problem is that file can actually recognize only a few "source code" file types, but better than nothing, it will do the job for all major file formats.

I will probably add the check via file for Linux soon. Also, it could work on Windows if user has file available via $PATH env. variable.

martinrotter commented 5 years ago

@paoloschi I committed the changes and automagic resolution of file type via file should now work on all systems where file is available.

I tested so far on Windows, but will test on Arch Linux probably today evening.

Please, test latest development binaries once b2d3202 gets compiled and let me know. :)

Thank you.

paoloschi commented 5 years ago

textosaurus-master-885ce93-linux64.AppImage I tried to open the files I usually work with and all of them were correctly recognized. Well done!

If I create a new script as 'Plain text' -as in the example to my initial post- and save it, I see that the format does not update after the 'Save File as...'. I have to call 'Reload From Disk' to see correctly reassigned the format in the 'Language' menu but still the syntax highlighting is not applied. Same thing also by editing and re-saving the file.

As I wrote, the experience with other editors is the immediate application of the new format and syntax highlighting as soon as one save the file the first time. Would it be proper to get this behavior in Textosaurus too?

martinrotter commented 5 years ago

This seem as regression/bug, let me check...

martinrotter commented 5 years ago

If I create a new script as 'Plain text' -as in the example to my initial post- and save it, I see that the format does **not** update after the 'Save File as...'. Yes, the selection of "file type" in "Save as" dialog is there rather for filtering the file list than for assigning the "format" to just saving file. At this time the whole "file type" combobox in that dialog is rather confusing and probably useless, and I will probably just remove it from "Save as" dialog and only leave it in "Open..." dialog where it actually make more sense for filtering the file list?

I have to call 'Reload From Disk' to see correctly reassigned the format in the 'Language' menu but still the _syntax highlighting_ is not applied. Yes, this one is clearly a bug and I reproduced it right now. Will try to fix rn.

EDIT - reload from drive bug fixed in latest commit

If you open "plain text" file, then change its language in "Languages" menu (at which point highlighting should immediatelly kick in) and then save file, everything should work, no?

martinrotter commented 5 years ago

Plus, I now added that if user is opening file via "open" dialog and he explicitly selects some filter, for example "Bash files...", then the recognition of file format is entirely skipped and highlighting is completely based on the "file filter" selection.

martinrotter commented 5 years ago

OK, now Textosaurus should automatically reload lexer/font after the file is saved with different file type too.

Test https://github.com/martinrotter/textosaurus/commit/e46e5aaed981bd28d8b01a176127376a82a3c83f

and let me know, pls.

paoloschi commented 5 years ago

textosaurus-master-e46e5aa-linux64.AppImage

Yes, the selection of "file type" in "Save as" dialog is there rather for filtering the file list than for assigning the "format" to just saving file.

I didn't make myself clear. I never referred to the "file type" combobox in "Save as" dialog and I meant: I created the new file without arranging the format through the 'Language' menu but only by entering the shebang, then the first time I save it I give it a name without extension.

It is at this point that nothing happens, 'Plain text' is still the format in the 'Language' menu, etcetera while other editors in the same situation switch selected format and apply highlighting as soon as the file has been the first time saved. The situation in Textosaurus is to have now opened a file that the editor can identify as Bash script but the interface state still does not reflect this ability (unless the user intervenes manually)... I also get the same behavior with e46e5aa.

If you open "plain text" file, then change its language in "Languages" menu (at which point highlighting should immediatelly kick in) and then save file, everything should work, no?

Yes, this is the behavior I found in the program for any file without extension and led me to ask for the format changeover automation. In practice, after b2d3202 if my file is now intercepted as one of the known formats, it is not even possible to open it as 'Plain text' and this is fine.

martinrotter commented 5 years ago

Can you please, give me names of some text editors which actually automatically change highlighting after you use precisely the steps you described?

paoloschi commented 5 years ago

To be more precise, what I was hoping for has already become a reality with e46e5aa provided that before pressing Save I selected a file type and still didn't define an extension in the name. I confirm that. In my experience with Geany it is not necessary to indicate a format while saving but you can very well live with this difference.

martinrotter commented 5 years ago

Yes, that is the desired behavior I've just added.

I have one problem with Geany's behavior in this aspect, what if user actually wants to show file with different highlighting "lexer" or wants to have just "plain-text" lexer? If I automatically re-detected file type in all cases and reloaded it, then user would have lost his settings.

paoloschi commented 5 years ago

I checked now: _shebang_Bash present in the first line and forced type format to 'Lua': Geany keeps Lua also after saving but instead forcing the text plane and saving it resets the Bash lexer.

I myself tend to be against too many automatisms, I like them only where they spare we from having to repeat and repeat tedious actions... How you modified the program today is great for me!

martinrotter commented 5 years ago

Perfect, I also believe that overly-automating things is not good approach. I double-checked behavior parity with Notepad++ and it seems that Textosaurus behaves as good as N++ and even better in some cases (they do not have 'file' utility recognition).

Closing this. Thanks.