uktrade / stream-zip

Python function to construct a ZIP archive on the fly
https://stream-zip.docs.trade.gov.uk/
MIT License
100 stars 9 forks source link

password protected Zip file #93

Closed mmguero closed 8 months ago

mmguero commented 8 months ago

Would it be possible for this library to support creation of a password-protected Zip file? I know there are some libraries like pyminizip that can do it but the streaming nature of stream-zip is much better in other respects.

michalc commented 8 months ago

@mmguero Probably yes - it's especially a bit odd that the sibling package to this, https://github.com/uktrade/stream-unzip, can decrypt/open password protected ZIP files, but this package can't make them.

Do you have a particular type of password protection that would be useful for your use case? There are a few: e.g. Legacy/ZipCrypto/Zip 2.0, or Winzip-style AES? (Or even, I think there's a more modern PKZip "official" password protected format that I don't know too much about, but I suspect creating such ZIPs might be a no-go without PKZip's permission)

(Also: no particular ETA on this as well if you would like this for a current project)

mmguero commented 8 months ago

Yeah it's something of a tradeoff, isn't it? From what I understand the zipcrypto encryption is pretty easily cracked with brute force. Both WinZIP and 7zip (and a few others) support AES-256 which is much better security-wise but can't be opened by Windows natively. So it's a tradeoff of compatibility vs. security.

Part of me almost says to go with the less secure, more compatible option because if I'm going to go with something they'd need to download 7zip for anyway I might as well just look at py7zr and create a .7z archive. But as a cybersecurity guy I also strongly see the argument for the real encryption.

It looks like your stream-unzip package can handle both, so maybe making both an option? If not, I'd be happy with whatever you think would be best for the project's sake, I could live with either.

michalc commented 8 months ago

But as a cybersecurity guy I also strongly see the argument for the real encryption. ... ... I'd be happy with whatever you think would be best for the project's sake

I think I'm tempted for just Winzip's AES really. Keeps the API surface small, less scope for accidentally using the less secure one, less to code up, and since ZipCrypto is apparently so easy to crack, it would essentially just be adding code and faff without much of a benefit in terms of security - and maybe even a negative benefit if it's chosen rather than making sure both sides support AES.

(I'm wondering now if stream-unzip should even remove support for ZipCrypto for a similar reason... hmmm....)

mmguero commented 8 months ago

I think that makes a lot of sense. AES is probably the way to go.

guysl10 commented 8 months ago

Hey, I just now got into this issue. I am looking for a stream zipper that can zip with a password. In my case, I don't need the best password encryption. My need is to pass a suspicious file. If I deliver the file to download as it is and not zipped with a password, the antivirus will prevent me from downloading it.

So, for example, I don't need the best encryption but the ability to zip with a password as stream.

michalc commented 8 months ago

@guysl10 That’s an interesting use case that I never considered… Essentially it’s to “work around antivirus”?

What’s the antivirus? Is there no way to mark the specific file as allowed because you really want it? Also: what would then uncompress the compressed suspicious file?

guysl10 commented 8 months ago

@michalc Yep, I want to help analysts download suspicious files from my website for investigations. The problem is if I let them download the file as it is, any antivirus will prevent them from downloading it. The problem is not with a specific antivirus but any antivirus / EDR agent. When the file is zipped with a password, the antivirus will not detect the file and will allow the user to download the file. Another important note is that if I zip the file without a password, the antivirus can extract and prevent the user from downloading it. I want to give the user a solution rather than asking them for exceptions whenever they want to download a suspicious file. This is a widespread way to work with suspicious / malware files when you need to investigate them. The user (analyst) will uncompress the file, but they can move the file to a quarantined environment.

I hope this explains my needs better :)

michalc commented 8 months ago

Ah understood. Although one thing:

The user (analyst) will uncompress the file, but they can move the file to a quarantined environment.

What would they use to uncompress it? And more specifically, would it support Winzip's AES ZIP format?

guysl10 commented 8 months ago

@michalc The common use is to uncompress it to investigate the suspicious file (reverse engineering \ running in a sandbox). They should use standard zippers, Winrar / Winzip / 7zip, etc.

michalc commented 8 months ago

@guysl10 Ah so it sounds like while AES is maybe overkill for your particular use case, it's still acceptable(?)

guysl10 commented 8 months ago

@michalc Yes, you are right. AES could also work for me. But what is more important for me is the zip with the password than the level of encryption the zip will have. :)

michalc commented 8 months ago

Just to communicate made a very small start on this in https://github.com/uktrade/stream-zip/pull/94

michalc commented 8 months ago

https://github.com/uktrade/stream-zip/pull/94 is getting there I think. Need to tidy it up a bit

michalc commented 8 months ago

https://github.com/uktrade/stream-zip/pull/94 now merged and release in v0.0.69

I'm not really sure if it should only use AE-2 for small files, and use AE-1 for larger, like https://www.winzip.com/en/support/aes-encryption/ describes. Leaving this open for a little bit while I ponder/investigate (and if anyone has thoughts on that, they are welcome)

mmguero commented 8 months ago

Thanks! I'll give the release a shot and will let you know if I run across anything

michalc commented 8 months ago

For now, opting to not implement AE-1 at all: based on https://crypto.stackexchange.com/a/109269/113464

And also pondering on other making other changes...

mmguero commented 8 months ago

Based on that reading, yeah I agree I think just doing AE-2 seems like the better idea.

michalc commented 8 months ago

I'm going to now treat this as done. To summarise the changes...

The related PRs:

https://github.com/uktrade/stream-zip/pull/94 https://github.com/uktrade/stream-zip/pull/95 https://github.com/uktrade/stream-zip/pull/96 https://github.com/uktrade/stream-zip/pull/97

The main documentation change, which is now at: https://stream-zip.docs.trade.gov.uk/advanced-usage/#password-protection-%2F-encryption

And release: https://github.com/uktrade/stream-zip/releases/tag/v0.0.69 https://pypi.org/project/stream-zip/0.0.69/

If there are any problems, of course feel free to open an issue

mat-gas commented 2 months ago

Hi @michalc ,

would it be feasible to also add legacy zipcrypto to this lib? (not used by default when requesting encryption but feasible with a boolean parameter for example)

We propose downloading ZIP compressed files (executables that could be malwares) from our platform, and it's frequently consumed by python scripts (for instance to send to other platforms or sandboxes) which only support legacy crypto (stock python library only supports legacy ZIP crypto).

Those scripts frequently go through HTTP proxies that perform AV scans so we need the files to be encrypted (even if slightly with bad/legacy crypto).

Thanks in advance

michalc commented 2 months ago

@mat-gas,

Understood the requirement, but I'm torn about adding it: we have no use for it ourselves, and it's pretty much the opposite direction of travel that I had in mind in terms of security (not that this any sort of policy or anything, just a rough idea in my head). What would your course of action be if we don't add it?

Michal

mat-gas commented 2 months ago

@michalc

we are currently using a mix of pyzipper and zipencrypt to support both legacy ZIP crypto and AES one (with legacy for API usage and AES when accessing through a browser), and we're also hand crafting streaming tar.gz archives when not encrypting stuff

this is kind of a historic mess and we'd very much like to rationalize everything and use a sole library (and switching from tar.gz to .zip file too) that meets all our features without removing already existing stuff (this would really annoy users that depends on the legacy zip crypto ecryption :( )

as we've recently stumbled on this particular issue (https://github.com/devthat/zipencrypt/issues/2), this triggered a "we can do better and replace all this mess" process

What would our course of action be? no idea right now,

I completly understand that, in 2024, we should not implement weak crypto, but one of the main backend language out here (python) doesn't support "recent" stuff and we unfortunately need to accomodate this at work