Open catap opened 7 years ago
Well, it crashed inside zlib compression.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x14423e000)
* frame #0: 0x00007fffb61c1fbf libsystem_platform.dylib`_platform_memmove$VARIANT$Haswell + 287
frame #1: 0x000000010009b21e librnp.0.dylib`zlib_compressed_data_reader(stream=0x0000000100702050, dest=0x0000000143800000, length=<unavailable>, errors=0x0000000100702158, readinfo=0x0000000100702090, cbinfo=0x00000001007020d8) at compress.c:192 [opt]
frame #2: 0x00000001000a3bd5 librnp.0.dylib`sub_base_read(stream=0x0000000100702050, dest=0x0000000143800000, length=1073741824, errors=0x0000000100702158, readinfo=<unavailable>, cbinfo=0x00000001007020d8) at packet-parse.c:241 [opt]
frame #3: 0x00000001000a3daf librnp.0.dylib`pgp_limited_read [inlined] full_read(stream=0x0000000100702050, dest="[...]", length=1073741824, errors=0x0000000100702158, readinfo=0x0000000100702090, cbinfo=<unavailable>) at packet-parse.c:326 [opt]
frame #4: 0x00000001000a3d7d librnp.0.dylib`pgp_limited_read(stream=0x0000000100702050, dest="[...]", length=1073741824, region=0x00007fff5fbe9050, errors=0x0000000100702158, readinfo=0x0000000100702090, cbinfo=<unavailable>) at packet-parse.c:417 [opt]
frame #5: 0x00000001000a621b librnp.0.dylib`parse_packet [inlined] limread(dest=<unavailable>, length=1073741824, region=<unavailable>, info=0x0000000100702050) at packet-parse.c:456 [opt]
frame #6: 0x00000001000a61f4 librnp.0.dylib`parse_packet at packet-parse.c:2422 [opt]
frame #7: 0x00000001000a6071 librnp.0.dylib`parse_packet(stream=0x0000000100702050, pktlen=<unavailable>) at packet-parse.c:3328 [opt]
frame #8: 0x00000001000a533b librnp.0.dylib`pgp_parse(stream=0x0000000100702050, perrors=0) at packet-parse.c:3433 [opt]
frame #9: 0x000000010009af16 librnp.0.dylib`pgp_decompress(region=0x00007fff5fbf0710, stream=0x0000000100702050, type=PGP_C_ZLIB) at compress.c:394 [opt]
frame #10: 0x00000001000a6003 librnp.0.dylib`parse_packet [inlined] parse_compressed at packet-parse.c:2279 [opt]
frame #11: 0x00000001000a5f66 librnp.0.dylib`parse_packet(stream=0x0000000100702050, pktlen=<unavailable>) at packet-parse.c:3320 [opt]
frame #12: 0x00000001000aac8b librnp.0.dylib`decrypt_se_ip_data [inlined] pgp_parse(stream=0x0000000100702050, perrors=0) at packet-parse.c:3433 [opt]
frame #13: 0x00000001000aac80 librnp.0.dylib`decrypt_se_ip_data(tag=PGP_PTAG_CT_SE_IP_DATA_BODY, region=<unavailable>, stream=<unavailable>) at packet-parse.c:3107 [opt]
frame #14: 0x00000001000a64ba librnp.0.dylib`parse_packet [inlined] parse_se_ip_data at packet-parse.c:3188 [opt]
frame #15: 0x00000001000a63d4 librnp.0.dylib`parse_packet(stream=0x0000000100702050, pktlen=<unavailable>) at packet-parse.c:3349 [opt]
frame #16: 0x00000001000a533b librnp.0.dylib`pgp_parse(stream=0x0000000100702050, perrors=1) at packet-parse.c:3433 [opt]
frame #17: 0x000000010009c77b librnp.0.dylib`pgp_decrypt_file(io=<unavailable>, infile=<unavailable>, outfile=<unavailable>, secring=0x0000000101200090, pubring=<unavailable>, use_armour=<unavailable>, allow_overwrite=<unavailable>, sshkeys=<unavailable>, passfp=<unavailable>, numtries=<unavailable>, getpassfunc=<unavailable>) at crypto.c:660 [opt]
frame #18: 0x00000001000b931c librnp.0.dylib`rnp_decrypt_file(ctx=0x00007fff5fbff288, f="1g.bin.gpg", out=0x0000000000000000) at rnp.c:1184 [opt]
frame #19: 0x0000000100002153 rnp`rnp_cmd(cfg=0x00007fff5fbff380, rnp=0x00007fff5fbff3a0, cmd=261, f="1g.bin.gpg") at rnp.c:337 [opt]
frame #20: 0x0000000100001999 rnp`main(argc=<unavailable>, argv=<unavailable>) at rnp.c:684 [opt]
frame #21: 0x00007fffb5faf235 libdyld.dylib`start + 1
frame #22: 0x00007fffb5faf235 libdyld.dylib`start + 1
(lldb) up
librnp.0.dylib was compiled with optimization - stepping may behave oddly; variables may not be available.
frame #1: 0x000000010009b21e librnp.0.dylib`zlib_compressed_data_reader(stream=0x0000000100702050, dest=0x0000000143800000, length=<unavailable>, errors=0x0000000100702158, readinfo=0x0000000100702090, cbinfo=0x00000001007020d8) at compress.c:192 [opt]
189 if (len + cc > length) {
190 len = length - cc;
191 }
-> 192 (void) memcpy(&cdest[cc], &z->out[z->offset], len);
193 z->offset += len;
194 }
195
(lldb)
And this bug doesn't related to encryption process, because it happened also at file that was encrypted by gnupg.
as confirmation that crash is our bug I did a very simple test: encrypt a 1gb file by rnp and try to decrypt it by gpg.
GnuPG decrypts rnp's file.
I doubt that memory usage can be easily decreased - current codebase doesn't use streamed processing. To change this we'll need to rewrite a lot of things in readers/writers.
@ni4 yeah, but I think we should do this because gpg use a constant (about 1mb) memory.
Agree with @catap that we must use iterative, block-size processing. Compression, encryption / decryption, signing / verification all work with such processing. @ni4 any particular part we should be concerned about?
@ronaldtse I also completely agree that we should process data in streamed way using the constant memory. I just noting that this would require rewriting a lot of stuff inside of the library taking a lot of time. So in my opinion it would be better to do that not right now but after we have first release.
That's true, I suppose we will do what we can incrementally without require a huge one-time rewrite.
There should be some low-hanging fruit here that could be improved upon immediately, though.
This would be a good candidate: https://github.com/riboseinc/rnp/blob/5c4376c298656debf8bc03cb0c8ca0c2aa8bd611/src/lib/signature.c#L1311
So for detached signing, we're reading the entire file into memory, just so we can hash it. That's obviously dumb and easily improved. A simple fread
loop that calls pgp_sig_add_data
is probably an improvement.
@dewyatt well, pgp_mem_readfile
is wrapper around mmap
and I don't think that here maybe an issue.
But I agree that we can add to pgp_sig_add_data
by fixed size chunk and it may helps.
anyway I think that good point is spend some time to find who uses so many memory and how we can fix it as easy workaround before release.
And after release rewrite to streaming everything as suggest @ni4
@catap echoing @dewyatt it's not mmap
that uses this memory immediately, it is the subsequent read
calls that loads the full file contents.
https://github.com/riboseinc/rnp/blob/b9c993a46d4d1da12d5a24ac6bbb11620107f029/src/lib/misc.c#L765
I guess we need to either rewrite pgp_mem_readfile
or ensure the chunk size concept is consistent from top to bottom. This is still rather 'low-hanging'? 😉
@ronaldtse it runs read
only if mmap
fails:
Geez, my reading skills.
Right @catap is right here and I completely forgot we were using mmap here. :)
We could mmap/munmap
portions of the file chunk by chunk instead to prevent this issue.
This line mmap
s the entire file size at once.
https://github.com/riboseinc/rnp/blob/b9c993a46d4d1da12d5a24ac6bbb11620107f029/src/lib/misc.c#L755
I wonder if we could get a performance boost for cases like detached signing (where we just need to read+hash the file) with posix_madvise
+ POSIX_MADV_SEQUENTIAL
. It seems to fit the use case perfectly, so maybe worth a shot.
Having read this I wonder if a simple fgets
is actually faster and uses less memory that what we do now?
I doubt that file reading operation is the main bottleneck. We should just read it in large enough chunks (say, 64k or 128k) instead of 1-byte reads.
Plenty of stuff to try, someone just needs to jump on this 👍
@ronaldtse I'm on it, but right now I'm in 4gb+ files hell :)
@dewyatt, @flowher, @ni4, @ronaldtse I think I need your help.
While I've worked at #421 I explain intresting behaviour of GnuPG.
It's using partial/streaming writing for any packages that bigger than 8kb.
For example
➜ rnp-gpg-test pgpdump 100k.bin.gpg
Old: Public-Key Encrypted Session Key Packet(tag 1)(268 bytes)
New version(3)
Key ID - 0x9EBD8590F664AD37
Pub alg - RSA Encrypt or Sign(pub 1)
RSA m^e mod n(2048 bits) - ...
-> m = sym alg(1 byte) + checksum(2 bytes) + PKCS-1 block type 02
New: Symmetrically Encrypted and MDC Packet(tag 18)(8192 bytes) partial start
Ver 1
Encrypted data [sym alg is specified in pub-key encrypted session key]
(plain text + MDC SHA1(20 bytes))
New: (8192 bytes) partial continue
New: (8192 bytes) partial continue
New: (8192 bytes) partial continue
New: (8192 bytes) partial continue
New: (8192 bytes) partial continue
New: (8192 bytes) partial continue
New: (8192 bytes) partial continue
New: (8192 bytes) partial continue
New: (8192 bytes) partial continue
New: (8192 bytes) partial continue
New: (8192 bytes) partial continue
New: (4096 bytes) partial continue
New: (103 bytes) partial end
➜ rnp-gpg-test
I made a small research how it works and realised that GnuPG creates partial chunk with size no more than 8kb, if it smaller, it try creates an n^2 chunk size, but if it less than 512 it creates last chunk with this size.
RFC 4880 describes that partial chunk might be up to 1Gb and existed rnp code (and netpgp I think) supported this 1Gb chunk.
I think the main reason of this 8kb limit is some sort of legacy but IMO switching to GnuPG way may us help decrease memory using and I haven't see any issue with this.
Do you see anything?
@catap This may help abit. However I started writing streamed processing code. This will take some time but after that we'll use constant memory footprint for all of the operations.
@ni4 have you mean decoding code? Because rnp has encoding code :)
@catap Both, reader and writer.
@ni4 but #421 uses a writer...
@catap Probably I used wrong description here, I meant #424 - like approach: I.e. we process data in limited-size block from the input to the output. While now, for instance, in writer we write the whole literal data packet to the memory, then compress it, then write to output.
@ni4 @catap As far as I understand partial chunks will solve the problem mentioned in 424, isn't it?
"For PGP data processing it would be a bit different since we don't know the packet contents from the beginning, so should insert some 'decision makers' which parse data and create processors on the go.",
@flowher yes, and you're just approved a PR where I used existed code of partial request. Anyway I haven't saw #424 before, and started working on reader part, because writer part existed...
Have I missed anything?
Yep, the 421. Nothing missed. I just think that using partial chunks for reading and writing is good idea (I thought that streaming will be kind of solved by using partial chunks).
@catap @flowher Now I finally got an idea, sorry, was a bit tired. But what happen if some implementation will write the 1Gb partial chunk? Or put in a z-bomb with say 1kb of zlib data which decompresses to 10gb?
Implementation needs to take care of the case that 1 partial block of input doesn't mean 1 partial block of output. But I would expect that implementation is processing one partial block (of input) at a time. So,
input file is read to memory by chunks (say, 64k blocks)
I would say this will be a partial chunk How about that?
@ni4 GnuPG can encrypt and decrypt (not all time to be honest) very big file. I never tried encode/decode files bigger than 1tb, but it works for up to 1 tb.
and yes, it creates a lot of 8kb chunk :)
Have you in mind if I finish partital decoder on this task? :)
GnuPG probably splits into 8KB packets for streaming purposes? I'm fine with using the GnuPG approach, but maybe the packet size could be tunable.
I do agree that #424 should be implemented in case someone gives us a 1GB packet, I believe @ni4 is working on that.
@ronaldtse Yep, that's what I'm working on right now. Starting from reader, writer will be next. Will first implement this stuff for symmetric encryption/decryption.
@catap just posted this gist https://gist.github.com/catap/f909b1d221f93c446edff03832fba075. rnp can decrypt a 100k file from GnuPG but not yet the 4gb file.
Hopefully we can solve this soon.
I've tried to run very simple and dirty performance testings to compare performance rnp (last master) and GnuPG 2.1.23.
All tests run on my laptop with a lot of running background application and I didn't re-run it few times to makes clear result. Anyway it shows the picture and makes a derection to optimisation.
Short summary:
unsigned int
as file size, so 4Gb is 0 and bigger file will be just corrupted ( @ronaldtse example of issue that I mentored before ;) );Test restults (all times in seconds, used same key): rnp:
gpg:
CPU using at encrypting 1gb file:
How you can see 78% CPU spends at
memcpy
that adds some optimism to fix it easy :)cc: @ronaldtse @dewyatt
Anyway, right now I see few task for me:
unsigned int
tosize_t
or chunk (I'm not sure does pgp format supports chunked file, I think yes but I might be wrong, I need to check it).memcpy
(I feel that it related at memory using, but I haven't see yet).