sillsdev / SpeechAnalyzer

SIL Speech Analyzer is a Windows program for acoustic analysis of speech sounds.
https://software.sil.org/speech-analyzer/
7 stars 3 forks source link

feat: Add Rows-CSV format to SFM export #79

Closed darcywong00 closed 2 years ago

darcywong00 commented 2 years ago

Fixes #74

To support users exporting from SA to Cog, this inserts an option "Rows Comma-Separated Values" to the SFM export dialog. This option uses the , character to separate the entries instead of spaces .

Limitations

darcywong00 commented 2 years ago

Some initial comments from @sdysart:

I took a look at the csv feature you added to SA.
The export as CVS option will really help. I did notice that in the export, the phonetic row was off by one towards the end, that was because they had inserted a comma in one of the phonetic fields

I guess they had not decided how to proceed. I suppose that we can assume that the users will clean up the data prior to export, but it should be noted in the instructions somewhere that comma's can't be in the exported fields if exporting as csv. It would be easily to fix. but it should be noted.

The export still exports as SFM, not csv. I had to rename it in File Explorer, which is easy enough to do. Also, Excel doesn't seem to like the way SA exports IPA. It goes weird. When I went through the process of editing the sfm in Notepad++ Tim E. pointed out that I had to mark the encoding as UTF-8 BOM in Notepad++ in order for Excel to pick it up. LibreOffice Calc asks me to pick the encoding when opening the csv file.

megahirt commented 2 years ago

We should be able to escape commas in the standard CSV way. I'm thinking that just means that each field has quotes around it. Best way to test is to take the offending CSV that @sdysart mentions and put quotes around the field in the line where the data has a comma. See if it then imports correctly.

On Sun, Mar 27, 2022, 7:21 AM Darcy Wong @.***> wrote:

Some initial comments from @sdysart https://github.com/sdysart:

I took a look at the csv feature you added to SA. The export as CVS option will really help. I did notice that in the export, the phonetic row was off by one towards the end, that was because they had inserted a comma in one of the phonetic fields

I guess they had not decided how to proceed. I suppose that we can assume that the users will clean up the data prior to export, but it should be noted in the instructions somewhere that comma's can't be in the exported fields if exporting as csv. It would be easily to fix. but it should be noted.

The export still exports as SFM, not csv. I had to rename it in File Explorer, which is easy enough to do. Also, Excel doesn't seem to like the way SA exports IPA. It goes weird. When I went through the process of editing the sfm in Notepad++ Tim E. pointed out that I had to mark the encoding as UTF-8 BOM in Notepad++ in order for Excel to pick it up. LibreOffice Calc asks me to pick the encoding when opening the csv file.

— Reply to this email directly, view it on GitHub https://github.com/sillsdev/SpeechAnalyzer/pull/79#issuecomment-1079800712, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2I6KJHYFD6FSH4PF53WHDVB6SX3ANCNFSM5RP333MA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

darcywong00 commented 2 years ago

Best way to test is to take the offending CSV that @sdysart mentions and put quotes around the field in the line where the data has a comma. See if it then imports correctly.

Yeah, we can try that next time. Though we'd also have to escape quotes within the string. For Excel, it might be ""?

darcywong00 commented 2 years ago

We should be able to escape commas in the standard CSV way. I'm thinking that just means that each field has quotes around it. Best way to test is to take the offending CSV that @sdysart mentions and put quotes around the field in the line where the data has a comma. See if it then imports correctly.

@sdysart tested the latest build. We found if he surrounds a field in " quotes, the CSV doesn't split the line with a comma.