Closed pbstudent closed 2 years ago
Better plain text document preparation for corpus - used Xpdftotext 4.03 (http://www.xpdfreader.com/download.html), not AntFileConverter 2.0 (https://www.laurenceanthony.net/software/antfileconverter/).
I was able to import the broken file into LibreOffice Calc with the following options: I had to uncheck "Space" and, crucially, clear the string delimiter textbox.
Thank you, I will re-test importing into LO calc the export results based on your configuration.
On Jan 25, 2022, at 20:51, Andrew MacDonald @.***> wrote:
I was able to import the broken file into LibreOffice Calc with the following options: https://user-images.githubusercontent.com/882444/151048942-b6101ec1-bf31-49dc-a375-4afd5f71590e.png I had to uncheck "Space" and, crucially, clear the string delimiter textbox.
— Reply to this email directly, view it on GitHub https://github.com/voyanttools/VoyantServer/issues/7#issuecomment-1021550758, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFZHF64VZ76AY6P5KU7VBUTUX35MDANCNFSM5KOGKRJQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you authored the thread.
I confirm the same results with text export from context in Voyant Tools Server version 2.5.3
Please note, the README.md does not include version number of Voyant Server
On Jan 25, 2022, at 20:51, Andrew MacDonald @.***> wrote:
I was able to import the broken file into LibreOffice Calc with the following options: https://user-images.githubusercontent.com/882444/151048942-b6101ec1-bf31-49dc-a375-4afd5f71590e.png I had to uncheck "Space" and, crucially, clear the string delimiter textbox.
— Reply to this email directly, view it on GitHub https://github.com/voyanttools/VoyantServer/issues/7#issuecomment-1021550758, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFZHF64VZ76AY6P5KU7VBUTUX35MDANCNFSM5KOGKRJQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you authored the thread.
Exporting a context infrequently creates a plain text file that breaks across a single row when copy/past or plain text import into spreadsheet. Source text file for corpus created in AntFileConverter 2.0. Appears to be a hidden character (revealed in import to Libre Calc at end of last row) that does not create a new line.
Contexts-resources_original broken.txt Contexts-resources_fixed.txt . Had to fix manually into spreadsheet, re-save as CSV then export to tab separated text to return to same plain text output as other contexts exports.