olivierkes / manuskript

A open-source tool for writers
http://www.theologeek.ch/manuskript
GNU General Public License v3.0
1.75k stars 232 forks source link

Inter Platform File Corruption #429

Open siliconserf opened 5 years ago

siliconserf commented 5 years ago

(Using 0.7.0 in both cases) In transferring files from my Linux machine (Arm Based, not native code compiled) to my Win10 machine, the latter will complain about duplicate ID numbers. Typically it will duplicate Outline sections, but sometimes scrambling order as items are duplicated. Single or multi-file option doesn't seem to affect the likelihood the problem will happen, but file size does - the larger the file the more likely the problem will occur. Copying the other way doesn't cause the problem. Sometimes the complaint does not show corresponding Outline duplication. When that happens I'm able to use Debug to show the duplication of IDs is not in the original file on the Linux machine.

gedakc commented 5 years ago

Did you copy any texts (scenes) and/or folders (chapters) while in the Outline or Editor pane?

If so then you may have encountered an existing bug with duplicate ids being created when copying and pasting in the Outline pane. See issue #324 (thank you for creating this one), and issue #290.

If this is not the case, then please provide the output from the console when loading the exact same file on both platforms.

siliconserf commented 5 years ago

Hi Curtis,

Attached is a pair of copies of a test project and a capture of the warnings issued in the Win10 opening of the file. It it is a zip file - just remove ".rename". I was very careful to copy a single file from one location in the project to another. (Scenes/Ideas is the target directory). The original problem showed up with a block copy of a folder and sub-files. One file this time, and it blew up with a number of duplicate IDs and duplicated Outline entries. I don't know if the error occurred in the Linux side, or the Win10 opening, but re-opening the original file on the Linux machine showed no such duplication of entries.

If there is anything else I can try, please let me know.

Regards,

Charlie H.

On Thu, Dec 6, 2018 at 8:48 AM Curtis Gedak notifications@github.com wrote:

Did you copy any texts (scenes) and/or folders (chapters) while in the Outline pane?

If so then you may have encountered an existing bug with duplicate ids being created when copying and pasting in the Outline pane. See issue

324 https://github.com/olivierkes/manuskript/issues/324 (thank you for

creating this one), and issue #290 https://github.com/olivierkes/manuskript/issues/290.

If this is not the case, then please provide the output from the console when loading the exact same file on both platforms.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/olivierkes/manuskript/issues/429#issuecomment-444942733, or mute the thread https://github.com/notifications/unsubscribe-auth/AetvFlfKi4yROKfmeDNP6k-6rzYI4CeFks5u2UplgaJpZM4ZFxKa .

-- Ceterum censeo, delenda est Trumpo

gedakc commented 5 years ago

Hi Charlie,

Unfortunately I did not observe any files in this issue. I'm guessing that you replied to the GitHub email with attachments, and GitHub stripped these when appending to the issue.

Would you be able to manually add the attachments to this issue using a web browser?

Thanks, Curtis

siliconserf commented 5 years ago

Here are the example files. The Win10 version is the "after" project, the other the "before", and the partial screen capture shows the now duplicated IDs. In thinking about this problem, and the other, related earlier post, problem in both cases is likely in the code that inserts documents into the document tree.

File Corruption Example.zip

siliconserf commented 5 years ago

File Corruption Example.zip

siliconserf commented 5 years ago

Posted problem example files in a zip file.

siliconserf commented 5 years ago

I think I uploaded the zip file, but when I look at the issue I don't see any of the posts. What am I doing wrong?

On Fri, Dec 7, 2018 at 10:45 AM Curtis Gedak notifications@github.com wrote:

Hi Charlie,

Unfortunately I did not observe any files in this issue. I'm guessing that you replied to the GitHub email with attachments, and GitHub stripped these when appending to the issue.

Would you be able to manually add the attachments to this issue https://github.com/olivierkes/manuskript/issues/429 using a web browser?

Thanks, Curtis

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/olivierkes/manuskript/issues/429#issuecomment-445327362, or mute the thread https://github.com/notifications/unsubscribe-auth/AetvFuv6IE4JY5aTB5mp46AZf93HzeBsks5u2rdZgaJpZM4ZFxKa .

-- Ceterum censeo, delenda est Trumpo

gedakc commented 5 years ago

I think I uploaded the zip file, but when I look at the issue I don't see any of the posts. What am I doing wrong?

In my Firefox web browser I see two uploads of filename File Corruption Example.zip. Perhaps you might need to refresh your web browser with the F5 key?

gedakc commented 5 years ago

One problem I noticed is that the case (upper/lower) of the filenames is not being maintained when the project files are being copied between Windows and Linux. This is causing two differently named copies of the files.

For example notice 0-SARAH.TXT and 0-Sarah.txt filenames from the Bank Shot Original and Bank Shot Win10 directory comparison below.

filename-case-duplicates

Windows is not case sensitive by default I recall. Linux is case sensitive.

If I recall correctly the FAT file system that originated with DOS and early versions of Windows is not case sensitive so if this file system is involved in storing/copying files then that might be the cause of the case discrepancy.

To avoid these problems be sure to keep the filename case the exact same.

siliconserf commented 5 years ago

Nice catch on the duplicate entries having different cases. I probably looked at the directories several times and blew past the case. I'm sure that Windows doesn't care about case for filenames. In any event, I haven't touched the file names. It must be a code issue. As I understand it, the application saves project entries as files using the the entry's outline name. The name in the text portion of the file is faithfully saved, but not the version used for the filename.

The problem shown in the Bank Shot project didn't happen in the Linux code as the file arrived on my Win machine quite faithful to how it was saved on the Linux machine. It was when the Win10 version opened the file that the duplicate entries show up. (Note: if Revisions is active, the duplicate(s) can have older versions of information within the one of the duplicate entries.) The application on Win10 knows it has changed the project's contents as it saves the corrupt version on exit even if I haven't edited anything.

On Sun, Dec 9, 2018 at 11:43 AM Curtis Gedak notifications@github.com wrote:

One problem I noticed is that the case (upper/lower) of the filenames is not being maintained when the project files are being copied between Windows and Linux. This is causing two differently named copies of the files.

For example notice 0-SARAH.TXT and 0-sarah.txt filenames from the Bank Shot Original and Bank Shot Win10 directory comparison below.

[image: filename-case-duplicates] https://user-images.githubusercontent.com/10405019/49701792-1d582f80-fbae-11e8-983a-4206c6b66bae.png

Windows is not case sensitive by default I recall. Linux is case sensitive.

If I recall correctly the FAT file system that originated with DOS and early versions of Windows is not case sensitive so if this file system is involved in storing/copying files then that might be the cause of the case discrepancy.

To avoid these problems be sure to keep the filename case the exact same.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/olivierkes/manuskript/issues/429#issuecomment-445565347, or mute the thread https://github.com/notifications/unsubscribe-auth/AetvFqr6S8A6Kg7qg43-9oSEHqXsbkR1ks5u3WfogaJpZM4ZFxKa .

-- Ceterum censeo, delenda est Trumpo

gedakc commented 5 years ago

Are you using either the FAT16 or FAT32 file system to store the manuskript project files?

If so then this would explain the upper/lower case filename discrepancies because FAT16 can only store upper case.


To investigate further I tried loading both of the project files with the current develop branch of Manuskript. I encountered problems with both projects.

Following is the console output from loading each project:

Bank Shot

$ bin/manuskript
Debug: Web rendering engine used: QWebView
Running manuskript version 0.8.0.
Found translation in settings:
Note: No translator found or loaded from system locale for locale en_CA.
Last accessed directory "/home/user/tmp" loaded.
Loading: /home/user/tmp/File Corruption Example/Bank Shot Original/Bank Shot.msk
Detected file format version: 1. Zip: False.
* Strange things in file 01-Notes
* Strange things in file 02-31_Day
* Strange things in file 03-Bell
* Strange things in file 04-Cron
* Strange things in file 05-Save_The_Cat
* Strange things in file 06-Snowflake
* Strange things in file 07-Tobias
* Strange things in file 08-Truby
* Strange things in file 09-Book
* Strange things in file 09-Scenes
* Strange things in file 11-Book
Project /home/user/tmp/File Corruption Example/Bank Shot Original/Bank Shot.msk loaded.

Bank Shot Original

$ bin/manuskript
Debug: Web rendering engine used: QWebView
Running manuskript version 0.8.0.
Found translation in settings:
Note: No translator found or loaded from system locale for locale en_CA.
Last accessed directory "/home/user/tmp/File Corruption Example/Bank Shot Original" loaded.
Loading: /home/user/tmp/File Corruption Example/Bank Shot Win10/Bank Shot.msk
Detected file format version: 1. Zip: False.
WARNING ! There are some items with same IDs: ['171', '172', '171', '172', '172', '171', '172', '175', '2', '182', '43', '55', '61', '67', '49', '121', '1', '73', '125', '127', '182', '43', '55', '61', '67', '49', '121', '1', '73', '125', '127', '171', '173', '2', '174', '176', '175', '171', '173', '2', '174', '176', '182', '43', '55', '61', '67', '49', '121', '1', '73', '125', '127', '175', '182', '43', '55', '61', '67', '49', '121', '1', '73', '125', '127']
Project /home/user/tmp/File Corruption Example/Bank Shot Win10/Bank Shot.msk loaded.

From here I searched for duplicate 171 IDs.

$ cd "Bank Shot Original"
$ find . -exec grep -li 171 {} \; 2>/dev/null
./Bank Shot/revisions.xml
./Bank Shot/outline/09-Scenes/FOLDER.TXT
./Bank Shot/outline/01-Notes/4-Politics.md
./Bank Shot/outline/01-Notes/4-Factions.md

A quick look at the files in an editor revealed the IDs are indeed duplicated.

For example by looking at the text files in the 01-Notes folder I see the following:

$ grep -B 2 -A 2 171 "./Bank Shot/outline/01-Notes/4-Politics.md" \
>                    "./Bank Shot/outline/01-Notes/4-Factions.md"
./Bank Shot/outline/01-Notes/4-Politics.md-title:          Politics
./Bank Shot/outline/01-Notes/4-Politics.md:ID:             171
./Bank Shot/outline/01-Notes/4-Politics.md-type:           md
./Bank Shot/outline/01-Notes/4-Politics.md-compile:        2
--
./Bank Shot/outline/01-Notes/4-Factions.md-title:          Factions
./Bank Shot/outline/01-Notes/4-Factions.md:ID:             171
./Bank Shot/outline/01-Notes/4-Factions.md-type:           md
./Bank Shot/outline/01-Notes/4-Factions.md-compile:        2

Based on this it appears that one of these text files was originally copied from the other. If so then this is a known bug. See issue #324.

To fix the duplicate ID issue you might manually edit each file to specify a unique ID for each ID that is duplicated. Once that is done then also refrain from copying in the Outline and Editor panes until the bug is fixed.

siliconserf commented 5 years ago

After upgrading to version 0.8.0 on both machines I experimented with the file that first got my attention to the problem. It is a bit larger than Bank Shot(size seems important). What I found was that using the Linux version to save the project in zip format, the Win10 application would not encounter a problem opening the project. If the Linux app used the "save to single file" option, Win10 repeatedly choked. Just switching back and forth between the two save versions using the Win10 and bailing out of the program after the saves fails to produce the problem. It would seem the interpreted version does something funny when saving the project that upsets the Windows version, but not the Linux. So, for the present I'm saving to one file (zip) when going between platforms.

siliconserf commented 5 years ago

Haven't been able to find the file type native to the Pinebook 14 Linux machine. The USB drive is FAT32. More interesting is the bug you noted affects down to single file copies. That makes things a bit ugly. I can work around not being able to use documents as templates, but it isn't nice.

On Mon, Dec 10, 2018 at 9:33 AM Curtis Gedak notifications@github.com wrote:

Are you using either the FAT16 or FAT32 file system to store the manuskript project files?

If so then this would explain the upper/lower case filename discrepancies because FAT16 can only store upper case.

To investigate further tried loading both of the project files with the current develop branch of Manuskript. I encountered problems with both projects.

Following is the console output from loading each project:

Bank Shot

$ bin/manuskript Debug: Web rendering engine used: QWebView Running manuskript version 0.8.0. Found translation in settings: Note: No translator found or loaded from system locale for locale en_CA. Last accessed directory "/home/user/tmp" loaded. Loading: /home/user/tmp/File Corruption Example/Bank Shot Original/Bank Shot.msk Detected file format version: 1. Zip: False.

  • Strange things in file 01-Notes
  • Strange things in file 02-31_Day
  • Strange things in file 03-Bell
  • Strange things in file 04-Cron
  • Strange things in file 05-Save_The_Cat
  • Strange things in file 06-Snowflake
  • Strange things in file 07-Tobias
  • Strange things in file 08-Truby
  • Strange things in file 09-Book
  • Strange things in file 09-Scenes
  • Strange things in file 11-Book Project /home/user/tmp/File Corruption Example/Bank Shot Original/Bank Shot.msk loaded.

Bank Shot Original

$ bin/manuskript Debug: Web rendering engine used: QWebView Running manuskript version 0.8.0. Found translation in settings: Note: No translator found or loaded from system locale for locale en_CA. Last accessed directory "/home/user/tmp/File Corruption Example/Bank Shot Original" loaded. Loading: /home/user/tmp/File Corruption Example/Bank Shot Win10/Bank Shot.msk Detected file format version: 1. Zip: False. WARNING ! There are some items with same IDs: ['171', '172', '171', '172', '172', '171', '172', '175', '2', '182', '43', '55', '61', '67', '49', '121', '1', '73', '125', '127', '182', '43', '55', '61', '67', '49', '121', '1', '73', '125', '127', '171', '173', '2', '174', '176', '175', '171', '173', '2', '174', '176', '182', '43', '55', '61', '67', '49', '121', '1', '73', '125', '127', '175', '182', '43', '55', '61', '67', '49', '121', '1', '73', '125', '127'] Project /home/user/tmp/File Corruption Example/Bank Shot Win10/Bank Shot.msk loaded.

From here I searched for duplicate 171 IDs.

$ cd "Bank Shot Original" $ find . -exec grep -li 171 {} \; 2>/dev/null ./Bank Shot/revisions.xml ./Bank Shot/outline/09-Scenes/FOLDER.TXT ./Bank Shot/outline/01-Notes/4-Politics.md ./Bank Shot/outline/01-Notes/4-Factions.md

A quick look at the files in an editor revealed the IDs are indeed duplicated.

For example by looking at the text files in the 01-Notes folder I see the following:

$ grep -B 2 -A 2 171 "./Bank Shot/outline/01-Notes/4-Politics.md" \

               "./Bank Shot/outline/01-Notes/4-Factions.md"

./Bank Shot/outline/01-Notes/4-Politics.md-title: Politics ./Bank Shot/outline/01-Notes/4-Politics.md:ID: 171 ./Bank Shot/outline/01-Notes/4-Politics.md-type: md ./Bank Shot/outline/01-Notes/4-Politics.md-compile: 2

./Bank Shot/outline/01-Notes/4-Factions.md-title: Factions ./Bank Shot/outline/01-Notes/4-Factions.md:ID: 171 ./Bank Shot/outline/01-Notes/4-Factions.md-type: md ./Bank Shot/outline/01-Notes/4-Factions.md-compile: 2

Based on this it appears that one of these text files was originally copied from the other. If so then this is a known bug. See issue #324 https://github.com/olivierkes/manuskript/issues/324.

To fix the duplicate ID issue you might manually edit each file to specify a unique ID for each ID that is duplicated. Once that is done then also refrain from copying in the Outline and Editor panes until the bug is fixed.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/olivierkes/manuskript/issues/429#issuecomment-445902181, or mute the thread https://github.com/notifications/unsubscribe-auth/AetvFqEj9z8c3lpzZ1CmTH141ieCf36Vks5u3pr1gaJpZM4ZFxKa .

-- Ceterum censeo, delenda est Trumpo

gedakc commented 5 years ago

Thank you for discovering and reporting back that you are able to work on both Linux and Windows if the manuskript project is kept in a single file. This makes sense because the filenames within the single file are not subject to the same restrictions as a FAT file system.

To learn what file systems are in which partitions on Linux, use the following command:

blkid
gedakc commented 5 years ago

Note that for a work around for template copying, you can still select a range of characters, copy the range, select another scene, and then paste into that scene. This is not quite as elegant, but will avoid the duplicate ID problem inherent with issue #324

siliconserf commented 5 years ago

Yep, I had worked out how to do the new file and copy dance. Also, I noticed you were getting issue readouts ("strange things in file") that were not available to me when using the Windows software. How do I access that?

Regards,

Charlie H.

On Tue, Dec 11, 2018 at 8:27 AM Curtis Gedak notifications@github.com wrote:

Note that for a work around for template copying, you can still select a range of characters, copy the range, select another scene, and then paste into that scene. This is not quite as elegant, but will avoid the duplicate ID problem inherent with issue #324 https://github.com/olivierkes/manuskript/issues/324

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/olivierkes/manuskript/issues/429#issuecomment-446266408, or mute the thread https://github.com/notifications/unsubscribe-auth/AetvFh4nXkajGgB5kc4aZ-zPHpR9l-v5ks5u39zcgaJpZM4ZFxKa .

-- Ceterum censeo, delenda est Trumpo

gedakc commented 5 years ago

How do I access that ("strange things in file")?

I used the current develop branch when I loaded the projects.

siliconserf commented 5 years ago

Curtis,

I've set up Liclipse and the latest source on my Win10 laptop. And I managed to wave enough magic wands to configure it so I can try to debug Manuskript. With no prior experience with those tools this was an adventure. In trapping through the program in operation it occurred to me that outline item copy/past operations take place in PyQt5 library widgets, quite out of sight. That suggests to me the copy bug stems from no intervening operation occurring to deal with the pesky ID variable which needs to get a new value just prior to the paste op. At least I cannot seem to figure out where to set a breakpoint to examine the Outline object that just got copied so I can track successive steps to prove that is the flaw. Any help you can give would be appreciated.

On Wed, Dec 12, 2018 at 8:18 AM Curtis Gedak notifications@github.com wrote:

How do I access that ("strange things in file")?

I used the current develop branch when I loaded the projects.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/olivierkes/manuskript/issues/429#issuecomment-446647089, or mute the thread https://github.com/notifications/unsubscribe-auth/AetvFvty9-4KhO44GNSfrjyQj7KZaM-gks5u4SxugaJpZM4ZFxKa .

-- Ceterum censeo, delenda est Trumpo

gedakc commented 5 years ago

First let me begin by stating that I am not an expert in Object Oriented Programming.

From your description it sounds like you are looking at some Object Oriented code. With OO code it is common to override a class with a subclass method to implement specific functionality. In this case the subclass needs to assign a new ID for the pasted objects. If you can follow this train of thought then I think that will help you triage and resolve this problem.