javanthropus / archive-zip

A simple Ruby-esque interface to creating, extracting, and updating ZIP archives in 100% Ruby.
https://rubygems.org/gems/archive-zip
MIT License
83 stars 10 forks source link

Archiving and password protecting a file with unicode characters in the name causes an issue opening the file #12

Closed desoleary closed 10 years ago

desoleary commented 10 years ago

I have run into an issue where I am archiving a zip file that contains a simply text file that happens to have unicode characters in the name.

Gem successfully creates the archived file but when I attempt to open the file it basically says that the password I entered is incorrect.

Steps to reproduce:

  1. touch ☂file☄.txt

  2. zip touch ☂file☄.txt # I found that I was not able to zip the text file so I just did it directly from the MAC UI

  3. Use the following code snippet

Archive::Zip.archive(archive_file, original_file.path, name: original_filename, password: password, encryption_codec: Archive::Zip::Codec::TraditionalEncryption )

Screenshot example:

enter password keeps showing even though i am entering the correct password

javanthropus commented 10 years ago

I haven't had a chance to look into this yet, but are you certain that the filename is involved? Can you confirm that the same password used with a more basic file name does not exhibit this issue?

Also, is Archive::Zip able to extract the encrypted file? And if you create the archive using your Mac's built-in software, is this library able to extract it successfully?

More than likely there is an encoding issue either in password handling or, less likely, the filename handling. Something you might try is forcing the encoding of the password string to ASCII-8BIT (or binary), prior to passing it into the archive method.

desoleary commented 10 years ago

I haven't had a chance to look into this yet, but are you certain that the filename is involved? [Des] As a sanity check I simply made a copy of the original zip file and archived in exactly the same way and the password worked as expected. i.e. >> cp ☂file☄.txt.zip file.txt.zip

Can you confirm that the same password used with a more basic file name does not exhibit this issue? [Des] Confirmed as above

Also, is Archive::Zip able to extract the encrypted file? [Des] I got the following error when I tried to extract the file with Unicode characters (it worked with the exact file without Unicode characters in the name)

irb(main):006:0> fetched_file.testing_upload Archive::Zip::EntryError: /opt/fire/embedded/lib/ruby/1.9.1/irb.rb:168:in write': "\xE2" from ASCII-8BIT to UTF-8 (Encoding::UndefinedConversionError) from /opt/fire/embedded/lib/ruby/1.9.1/irb.rb:168:inprint' from /opt/fire/embedded/lib/ruby/1.9.1/irb.rb:168:in block (2 levels) in eval_input' from /opt/fire/embedded/lib/ruby/1.9.1/irb.rb:273:insignal_status' from /opt/fire/embedded/lib/ruby/1.9.1/irb.rb:156:in block in eval_input' from /opt/fire/embedded/lib/ruby/1.9.1/irb/ruby-lex.rb:243:inblock (2 levels) in each_top_level_statement' from /opt/fire/embedded/lib/ruby/1.9.1/irb/ruby-lex.rb:229:in loop' from /opt/fire/embedded/lib/ruby/1.9.1/irb/ruby-lex.rb:229:inblock in each_top_level_statement' from /opt/fire/embedded/lib/ruby/1.9.1/irb/ruby-lex.rb:228:in catch' from /opt/fire/embedded/lib/ruby/1.9.1/irb/ruby-lex.rb:228:ineach_top_level_statement' from /opt/fire/embedded/lib/ruby/1.9.1/irb.rb:155:in eval_input' from /opt/fire/embedded/lib/ruby/1.9.1/irb.rb:70:inblock in start' from /opt/fire/embedded/lib/ruby/1.9.1/irb.rb:69:in catch' from /opt/fire/embedded/lib/ruby/1.9.1/irb.rb:69:instart' from /opt/fire/amp/portal/vendor/bundle/ruby/1.9.1/gems/railties-3.2.19/lib/rails/commands/console.rb:47:in start' from /opt/fire/amp/portal/vendor/bundle/ruby/1.9.1/gems/railties-3.2.19/lib/rails/commands/console.rb:8:instart' from /opt/fire/amp/portal/vendor/bundle/ruby/1.9.1/gems/railties-3.2.19/lib/rails/commands.rb:41:in <top (required)>' from script/rails:6:inrequire' from script/rails:6:in `

'

And if you create the archive using your Mac's built-in software, is this library able to extract it successfully? [Des] >> zip -e archivename.zip ☂file☄.txt.zip # successfully able to extract the file archivename.zip ☂file☄.txt.zip

More than likely there is an encoding issue either in password handling or, less likely, the filename handling.

Something you might try is forcing the encoding of the password string to ASCII-8BIT (or binary), prior to passing it into the archive method. [Des] Tried encoding the password to binary using password: password.force_encoding('ASCII-8BIT'),

However by encoding the original file path using binary encoding I was able to unzip the file without issue.

desoleary commented 10 years ago

Sample code that worked:

-- encoding original file path in binary to ensure we correctly handle unicode characters Archive::Zip.archive(archive_file, original_file.path.force_encoding('ASCII-8BIT'), name: original_filename, password: password, encryption_codec: Archive::Zip::Codec::TraditionalEncryption )

javanthropus commented 10 years ago

I don't believe your workaround should be necessary, so this is still an issue as far as I'm concerned.

There also seems to be a bit of miscommunication regarding the files and testing. I'm confused about what you did for testing in any case. In the end, it appears that the name of the file in the archive is the problem, not the password or the name of the archive itself.

Since this library was written before Ruby 1.9's string encoding changes became mainstream, it's not surprising that we would hit an issue like this. I'll see if I can reproduce your problem and fix this properly within the library. I don't have a Mac, so reproduction may be a problem, but we'll see.

desoleary commented 10 years ago

Sounds great, I agree with you completely that this should not be causing an issue which is why I forked the project with the intension of adding a pull request but if you can re-create which I am sure will work the same in windows or mac as the original error actually happened on a windows machine then you'll be able to figure out the best place for adding the change.

Thanks for your quick response and if you need any help just let me know and I'd be more than happy to lend a hand.

desoleary commented 10 years ago

I am currently working on creating a spec for the bug in question so will send you a snippet once I am done.

This was a quick hack I added to replicate the issue.

################ START OF HACKY CODE ####################

file_name = "☂file☄.txt.zip"
archive_dir_name = "/tmp"
downloaded_original_file = File.open("/tmp/data/#{file_name}")
archive_file = zip_with_password(downloaded_original_file, File.join(Dir.open(archive_dir_name), file_name))

extract_file = Archive::Zip.extract("/tmp/#{file_name}", '/tmp/archive-dir', :password => 'some.password')
################ END OF HACKY CODE ####################
def zip_with_password(original_file_path, archive_path)
    archive_file = nil

    password = Site::Application.config.archive_password

    begin
      original_file = File.open(original_file_path)

      FileUtils.mkdir_p File.dirname(archive_path)
      archive_file = File.open(archive_path, "wb")

      Archive::Zip.archive(archive_file, encode_file_path_to_handle_unicode_characters(original_file),
                           name: original_filename,
                           password: password,
                           encryption_codec: Archive::Zip::Codec::TraditionalEncryption
      )
      archive_file.close
    rescue Exception => ex
      remove_file(archive_file)
      logger.error "Failed to create the zip archive #{ex.message} #{ex.backtrace.join}"
      raise ex
    end

    archive_file
  end

# encoding original file path in ascii to ensure we correctly handle unicode characters
  def encode_file_path_to_handle_unicode_characters(file)
    file.path.force_encoding('ASCII-8BIT')
  end
javanthropus commented 10 years ago

What's actually happening from what I can see is that using a multibyte name for a zip entry is causing the resulting archive to be corrupted somewhere. Infozip notes that the central directory of the archive is not terminated as expected.

This is a simple test case to show how the archive creation process fails for multibyte zip entries while it succeeds for ascii zip entries:

# encoding: UTF-8

require 'archive/zip'

def create_archive(archive_path, file_path)
  File.open(file_path, 'w') { |f| f.write 'data' }
  File.delete(archive_path) if File.exist?(archive_path)

  Archive::Zip.archive(archive_path, file_path)
end

create_archive('/tmp/multibyte.zip', '/tmp/☂file☄.txt')
create_archive('/tmp/ascii.zip', '/tmp/file.txt')

The /tmp/multibyte.zip archive will be corrupt while /tmp/ascii.zip will not be.

I'm continuing to dig into the issue.

javanthropus commented 10 years ago

I found the problematic lines and have the beginnings of a fix. I'm also going to need to take some time to audit the rest of the code for similar issues. I should hopefully have a new release with this fix in a couple of days.

desoleary commented 10 years ago

really happy to hear that its definitely a weird and unexpected bug also thanks for showing me the sample test code as I was finding it hard to replicate the bug under my fork due to my lack of understanding of some parts of the api

desoleary commented 10 years ago

any update?

javanthropus commented 10 years ago

Yes. Check the tip of the master branch (5f27d405) to see if that works for you.

There was some housekeeping I needed to perform before I could actually make the changes I wanted safely. These changes also break 1.8.6 compatibility, so I may refine them a bit more before I make a final release.

javanthropus commented 10 years ago

@desoleary, did the change work for you?

desoleary commented 10 years ago

I can confirm that it actually fixed the issue, thanks!

desoleary commented 10 years ago

Please let me know when this gets added to the official release so I can update my Gemfile to the new version

javanthropus commented 10 years ago

This fix was released in v0.7.0.

desoleary commented 10 years ago

awesome thank you again!