hipstas / AudiAnnotate

Workflows for generating AV editions and exhibits using IIIF manifests by HiPSTAS and Brumfield Labs.
https://hipstas.github.io/AudiAnnotate/
Apache License 2.0
15 stars 8 forks source link

Error message when uploading a TSV #202

Open kayleighv opened 2 years ago

kayleighv commented 2 years ago

Describe the bug Tried uploading annotations as a TSV file and received an error message that I don't know how to resolve. I'll email the TSV file; Github doesn't support me uploading the file type here.

To Reproduce Steps to reproduce the behavior:

  1. Edit a project (mine is here)
  2. Choose file ("Practice_GoogleSheets.tsv) under Add annotation file
  3. Click "Add"
  4. Configure columns (drop downs should read: A, B, C, Yes, D)
  5. Click "Process"
  6. Get error message (screenshot below)

Expected behavior The TSV file should process and add annotations to the project.

Screenshots Screen Shot 2022-01-27 at 11 13 48 AM

benwbrum commented 2 years ago

Running this locally with the file, I get a slightly different message:

ActionView::Template::Error (New line must be <"\r"> not <"\n"> in line 3.):
    25:       <table class="table table-bordered table-dark">
    26:         <thead>
    27:           <tr>
    28:             <% @annotation_file.sample_snippet.first.each_with_index do |cell, i|%>
    29:             <th>
    30:               <%= (i+65).chr %>
    31:             </th>
benwbrum commented 2 years ago

The file type appears to mix MacOS \r newline separators with Windows \n\r separators:

/home/benwbrum/Downloads/Practice_GoogleSheets.tsv: ASCII text, with CRLF, CR line terminators

Looking further, it seems like the line separators are \r\r\n, which is an encoding I've never seen before: two carriage returns followed by a newline!

benwbrum@sparckjones:~/dev/clients/clement/aa/audi-annotate/AudiAnnotateWeb$ more ~/Downloads/Practice_GoogleSheets.tsv 
time in hr:min:sec (displays in total sec)  time in hr:min:sec (displays in total sec)  annotation  layer
0   13  Well, I just want to start when I was, when my grandfather come on a ship, and he land in Dulac, in Dulac, in Terrebonne Parish.    Transcript
14  19  And he didn't go back on the ship, he stayed there and he married my grandmother.   Transcript
20  30  Which, then, my, my, that was my grandfather, then my grandmother was a Spanish and my grandfather was Italian. Transcript
0       This personal narrative was collected by Joseph Mele sometime between 1975 and 1980.    Context
benwbrum@sparckjones:~/dev/clients/clement/aa/audi-annotate/AudiAnnotateWeb$ od -c ~/Downloads/Practice_GoogleSheets.tsv 
0000000   t   i   m   e       i   n       h   r   :   m   i   n   :   s
0000020   e   c       (   d   i   s   p   l   a   y   s       i   n    
0000040   t   o   t   a   l       s   e   c   )  \t   t   i   m   e    
0000060   i   n       h   r   :   m   i   n   :   s   e   c       (   d
0000100   i   s   p   l   a   y   s       i   n       t   o   t   a   l
0000120       s   e   c   )  \t   a   n   n   o   t   a   t   i   o   n
0000140  \t   l   a   y   e   r  \r  \r  \n   0  \t   1   3  \t   W   e
0000160   l   l   ,       I       j   u   s   t       w   a   n   t    
0000200   t   o       s   t   a   r   t       w   h   e   n       I    
0000220   w   a   s   ,       w   h   e   n       m   y       g   r   a
0000240   n   d   f   a   t   h   e   r       c   o   m   e       o   n
0000260       a       s   h   i   p   ,       a   n   d       h   e    
0000300   l   a   n   d       i   n       D   u   l   a   c   ,       i
0000320   n       D   u   l   a   c   ,       i   n       T   e   r   r
0000340   e   b   o   n   n   e       P   a   r   i   s   h   .  \t   T
0000360   r   a   n   s   c   r   i   p   t  \r  \r  \n   1   4  \t   1
0000400   9  \t   A   n   d       h   e       d   i   d   n   '   t    
0000420   g   o       b   a   c   k       o   n       t   h   e       s
0000440   h   i   p   ,       h   e       s   t   a   y   e   d       t
0000460   h   e   r   e       a   n   d       h   e       m   a   r   r
0000500   i   e   d       m   y       g   r   a   n   d   m   o   t   h
0000520   e   r   .  \t   T   r   a   n   s   c   r   i   p   t  \r  \r
0000540  \n   2   0  \t   3   0  \t   W   h   i   c   h   ,       t   h
0000560   e   n   ,       m   y   ,       m   y   ,       t   h   a   t
0000600       w   a   s       m   y       g   r   a   n   d   f   a   t
0000620   h   e   r   ,       t   h   e   n       m   y       g   r   a
0000640   n   d   m   o   t   h   e   r       w   a   s       a       S
0000660   p   a   n   i   s   h       a   n   d       m   y       g   r
0000700   a   n   d   f   a   t   h   e   r       w   a   s       I   t
0000720   a   l   i   a   n   .  \t   T   r   a   n   s   c   r   i   p
0000740   t  \r  \r  \n   0  \t  \t   T   h   i   s       p   e   r   s
0000760   o   n   a   l       n   a   r   r   a   t   i   v   e       w
0001000   a   s       c   o   l   l   e   c   t   e   d       b   y    
0001020   J   o   s   e   p   h       M   e   l   e       s   o   m   e
0001040   t   i   m   e       b   e   t   w   e   e   n       1   9   7
0001060   5       a   n   d       1   9   8   0   .  \t   C   o   n   t
0001100   e   x   t
0001103
benwbrum commented 2 years ago

@kayleighv , can you tell us the steps you used to download/edit/upload the file? I'd like to know how we ended up with \r\r\n sequences.

kayleighv commented 2 years ago

@benwbrum sure! On this page, there's a link to a template, which I opened & saved a copy of in my own Google drive. Once I finished editing my copy, I downloaded the tab I was working in as a tsv file. I didn't make any changes to the file before trying to upload it via the project editing interface. I'm using a Mac, btw!

benwbrum commented 2 years ago

Hm. I've tried the same thing myself (though on a Linux machine, not a Mac) and get no double carriage returns:

benwbrum@sparckjones:~$ od -c  ~/Downloads/Copy\ of\ AudiAnnotate\ Annotations\ Template\ -\ Annotations\ Layer\ 1.tsv 
0000000   t   i   m   e       i   n       h   r   :   m   i   n   :   s
0000020   e   c       (   d   i   s   p   l   a   y   s       i   n    
0000040   t   o   t   a   l       s   e   c   )  \t   t   i   m   e    
0000060   i   n       h   r   :   m   i   n   :   s   e   c       (   d
0000100   i   s   p   l   a   y   s       i   n       t   o   t   a   l
0000120       s   e   c   )  \t   a   n   n   o   t   a   t   i   o   n
0000140  \r  \n   1   8   3   0   0  \t   1   8   6   6   0  \t   T   h
0000160   i   s       i   s       a       t   e   s   t  \r  \n   2   1
0000200   9   6   0  \t   2   5   6   2   0  \t   A   n   o   t   h   e
0000220   r       t   e   s   t       i   s       h   e   r   e
0000236

Maybe we'll need to search-and-replace the double carriage returns?

@kayleighv, would you mind sharing the google sheet you created with me, so that I can isolate the download process from the contents?

kayleighv commented 2 years ago

No problem—it's the second tab of this Google sheet.

benwbrum commented 2 years ago

It looks like the download-to-Mac is corrupting the file. If I download it locally, I don't see the \r\r\n character sequences causing the problem:

od -c ~/Downloads/AudiAnnotate\ Annotations\ Template\ -\ Annotations\ Layer\ 1.tsv 
0000000   t   i   m   e       i   n       h   r   :   m   i   n   :   s
0000020   e   c       (   d   i   s   p   l   a   y   s       i   n    
0000040   t   o   t   a   l       s   e   c   )  \t   t   i   m   e    
0000060   i   n       h   r   :   m   i   n   :   s   e   c       (   d
0000100   i   s   p   l   a   y   s       i   n       t   o   t   a   l
0000120       s   e   c   )  \t   a   n   n   o   t   a   t   i   o   n
0000140  \t   l   a   y   e   r  \r  \n   0  \t   1   3  \t   W   e   l
0000160   l   ,       I       j   u   s   t       w   a   n   t       t
0000200   o       s   t   a   r   t       w   h   e   n       I       w
0000220   a   s   ,       w   h   e   n       m   y       g   r   a   n
0000240   d   f   a   t   h   e   r       c   o   m   e       o   n    
0000260   a       s   h   i   p   ,       a   n   d       h   e       l
0000300   a   n   d       i   n       D   u   l   a   c   ,       i   n
0000320       D   u   l   a   c   ,       i   n       T   e   r   r   e
0000340   b   o   n   n   e       P   a   r   i   s   h   .  \t   T   r
0000360   a   n   s   c   r   i   p   t  \r  \n   1   4  \t   1   9  \t
0000400   A   n   d       h   e       d   i   d   n   '   t       g   o
0000420       b   a   c   k       o   n       t   h   e       s   h   i
0000440   p   ,       h   e       s   t   a   y   e   d       t   h   e
0000460   r   e       a   n   d       h   e       m   a   r   r   i   e
0000500   d       m   y       g   r   a   n   d   m   o   t   h   e   r
0000520   .  \t   T   r   a   n   s   c   r   i   p   t  \r  \n   2   0

I wonder if we could add code to scrub the doubles, perhaps in an error handling rescue block?