wmorgan / heliotrope

A personal, threaded, search-centric email server.
124 stars 17 forks source link

invalid byte sequence in UTF-8 #7

Closed filterfish closed 13 years ago

filterfish commented 13 years ago

heliotrope-add chokes on invalid utf-8. The following patch stops the error so the import will continue but I don't know it it's the right thing to do though.

diff --git a/lib/heliotrope/maildir-walker.rb b/lib/heliotrope/maildir-walker.rb
index 319b8cd..b03674d 100644
--- a/lib/heliotrope/maildir-walker.rb
+++ b/lib/heliotrope/maildir-walker.rb
@@ -47,11 +47,22 @@ private
   def get_date_in_file fn
     File.open(fn) do |f|
       while(l = f.gets)
+        error_count = 0
+        begin
         if l =~ /^Date:\s+(.+\S)\s*$/
           date = $1
           pdate = Time.parse($1)
           return pdate
         end
+        rescue => e
+          unless error_count > 1
+            l.encode!('utf-8', 'utf-8', :invalid => :replace)
+            error_count += 1
+            retry
+          else
+            puts "; cannot fix: #{e}: #{l}"
+          end
+        end
       end
     end
     puts "; warning: no date in #{fn}"
wmorgan commented 13 years ago

This should be fixed. Let me know if not. This encoding stuff is a little tricky...