Closed desoleary closed 10 years ago
I haven't had a chance to look into this yet, but are you certain that the filename is involved? Can you confirm that the same password used with a more basic file name does not exhibit this issue?
Also, is Archive::Zip able to extract the encrypted file? And if you create the archive using your Mac's built-in software, is this library able to extract it successfully?
More than likely there is an encoding issue either in password handling or, less likely, the filename handling. Something you might try is forcing the encoding of the password string to ASCII-8BIT (or binary), prior to passing it into the archive method.
I haven't had a chance to look into this yet, but are you certain that the filename is involved? [Des] As a sanity check I simply made a copy of the original zip file and archived in exactly the same way and the password worked as expected. i.e. >> cp ☂file☄.txt.zip file.txt.zip
Can you confirm that the same password used with a more basic file name does not exhibit this issue? [Des] Confirmed as above
Also, is Archive::Zip able to extract the encrypted file? [Des] I got the following error when I tried to extract the file with Unicode characters (it worked with the exact file without Unicode characters in the name)
irb(main):006:0> fetched_file.testing_upload
Archive::Zip::EntryError: /opt/fire/embedded/lib/ruby/1.9.1/irb.rb:168:in write': "\xE2" from ASCII-8BIT to UTF-8 (Encoding::UndefinedConversionError) from /opt/fire/embedded/lib/ruby/1.9.1/irb.rb:168:in
print'
from /opt/fire/embedded/lib/ruby/1.9.1/irb.rb:168:in block (2 levels) in eval_input' from /opt/fire/embedded/lib/ruby/1.9.1/irb.rb:273:in
signal_status'
from /opt/fire/embedded/lib/ruby/1.9.1/irb.rb:156:in block in eval_input' from /opt/fire/embedded/lib/ruby/1.9.1/irb/ruby-lex.rb:243:in
block (2 levels) in each_top_level_statement'
from /opt/fire/embedded/lib/ruby/1.9.1/irb/ruby-lex.rb:229:in loop' from /opt/fire/embedded/lib/ruby/1.9.1/irb/ruby-lex.rb:229:in
block in each_top_level_statement'
from /opt/fire/embedded/lib/ruby/1.9.1/irb/ruby-lex.rb:228:in catch' from /opt/fire/embedded/lib/ruby/1.9.1/irb/ruby-lex.rb:228:in
each_top_level_statement'
from /opt/fire/embedded/lib/ruby/1.9.1/irb.rb:155:in eval_input' from /opt/fire/embedded/lib/ruby/1.9.1/irb.rb:70:in
block in start'
from /opt/fire/embedded/lib/ruby/1.9.1/irb.rb:69:in catch' from /opt/fire/embedded/lib/ruby/1.9.1/irb.rb:69:in
start'
from /opt/fire/amp/portal/vendor/bundle/ruby/1.9.1/gems/railties-3.2.19/lib/rails/commands/console.rb:47:in start' from /opt/fire/amp/portal/vendor/bundle/ruby/1.9.1/gems/railties-3.2.19/lib/rails/commands/console.rb:8:in
start'
from /opt/fire/amp/portal/vendor/bundle/ruby/1.9.1/gems/railties-3.2.19/lib/rails/commands.rb:41:in <top (required)>' from script/rails:6:in
require'
from script/rails:6:in `
And if you create the archive using your Mac's built-in software, is this library able to extract it successfully? [Des] >> zip -e archivename.zip ☂file☄.txt.zip # successfully able to extract the file archivename.zip ☂file☄.txt.zip
More than likely there is an encoding issue either in password handling or, less likely, the filename handling.
Something you might try is forcing the encoding of the password string to ASCII-8BIT (or binary), prior to passing it into the archive method. [Des] Tried encoding the password to binary using password: password.force_encoding('ASCII-8BIT'),
However by encoding the original file path using binary encoding I was able to unzip the file without issue.
Sample code that worked:
-- encoding original file path in binary to ensure we correctly handle unicode characters Archive::Zip.archive(archive_file, original_file.path.force_encoding('ASCII-8BIT'), name: original_filename, password: password, encryption_codec: Archive::Zip::Codec::TraditionalEncryption )
I don't believe your workaround should be necessary, so this is still an issue as far as I'm concerned.
There also seems to be a bit of miscommunication regarding the files and testing. I'm confused about what you did for testing in any case. In the end, it appears that the name of the file in the archive is the problem, not the password or the name of the archive itself.
Since this library was written before Ruby 1.9's string encoding changes became mainstream, it's not surprising that we would hit an issue like this. I'll see if I can reproduce your problem and fix this properly within the library. I don't have a Mac, so reproduction may be a problem, but we'll see.
Sounds great, I agree with you completely that this should not be causing an issue which is why I forked the project with the intension of adding a pull request but if you can re-create which I am sure will work the same in windows or mac as the original error actually happened on a windows machine then you'll be able to figure out the best place for adding the change.
Thanks for your quick response and if you need any help just let me know and I'd be more than happy to lend a hand.
I am currently working on creating a spec for the bug in question so will send you a snippet once I am done.
This was a quick hack I added to replicate the issue.
################ START OF HACKY CODE ####################
file_name = "☂file☄.txt.zip"
archive_dir_name = "/tmp"
downloaded_original_file = File.open("/tmp/data/#{file_name}")
archive_file = zip_with_password(downloaded_original_file, File.join(Dir.open(archive_dir_name), file_name))
extract_file = Archive::Zip.extract("/tmp/#{file_name}", '/tmp/archive-dir', :password => 'some.password')
################ END OF HACKY CODE ####################
def zip_with_password(original_file_path, archive_path)
archive_file = nil
password = Site::Application.config.archive_password
begin
original_file = File.open(original_file_path)
FileUtils.mkdir_p File.dirname(archive_path)
archive_file = File.open(archive_path, "wb")
Archive::Zip.archive(archive_file, encode_file_path_to_handle_unicode_characters(original_file),
name: original_filename,
password: password,
encryption_codec: Archive::Zip::Codec::TraditionalEncryption
)
archive_file.close
rescue Exception => ex
remove_file(archive_file)
logger.error "Failed to create the zip archive #{ex.message} #{ex.backtrace.join}"
raise ex
end
archive_file
end
# encoding original file path in ascii to ensure we correctly handle unicode characters
def encode_file_path_to_handle_unicode_characters(file)
file.path.force_encoding('ASCII-8BIT')
end
What's actually happening from what I can see is that using a multibyte name for a zip entry is causing the resulting archive to be corrupted somewhere. Infozip notes that the central directory of the archive is not terminated as expected.
This is a simple test case to show how the archive creation process fails for multibyte zip entries while it succeeds for ascii zip entries:
# encoding: UTF-8
require 'archive/zip'
def create_archive(archive_path, file_path)
File.open(file_path, 'w') { |f| f.write 'data' }
File.delete(archive_path) if File.exist?(archive_path)
Archive::Zip.archive(archive_path, file_path)
end
create_archive('/tmp/multibyte.zip', '/tmp/☂file☄.txt')
create_archive('/tmp/ascii.zip', '/tmp/file.txt')
The /tmp/multibyte.zip
archive will be corrupt while /tmp/ascii.zip
will not be.
I'm continuing to dig into the issue.
I found the problematic lines and have the beginnings of a fix. I'm also going to need to take some time to audit the rest of the code for similar issues. I should hopefully have a new release with this fix in a couple of days.
really happy to hear that its definitely a weird and unexpected bug also thanks for showing me the sample test code as I was finding it hard to replicate the bug under my fork due to my lack of understanding of some parts of the api
any update?
Yes. Check the tip of the master branch (5f27d405) to see if that works for you.
There was some housekeeping I needed to perform before I could actually make the changes I wanted safely. These changes also break 1.8.6 compatibility, so I may refine them a bit more before I make a final release.
@desoleary, did the change work for you?
I can confirm that it actually fixed the issue, thanks!
Please let me know when this gets added to the official release so I can update my Gemfile to the new version
This fix was released in v0.7.0.
awesome thank you again!
I have run into an issue where I am archiving a zip file that contains a simply text file that happens to have unicode characters in the name.
Gem successfully creates the archived file but when I attempt to open the file it basically says that the password I entered is incorrect.
Steps to reproduce:
Archive::Zip.archive(archive_file, original_file.path, name: original_filename, password: password, encryption_codec: Archive::Zip::Codec::TraditionalEncryption )
Screenshot example: