Open kivisade opened 6 years ago
That does sound unfortunate. I had not accounted for such large databases, I will admit, and the nature of the program makes it difficult to ask for or mock data for testing.
I absolutely agree that it should be implemented as you describe for large databases. However, it would require at least some manual implementation of writing XML rather than offering it up to the marshalling library, so it would take same fiddling. I don't have time to work on this currently, but should have some time over the weekend.
Thanks for your quick reply, it's really great to know that you're actively working on this, and are taking suggestions into consideration!
If I understand correctly, each struct representing a single SMS can be marshalled to XML independently. The result of this will be a byte slice containing an individual <?xml
etc.) manually (the only problem being to write the count
attribute of <smses>
tag beforehand), and then writing to the file line-by-line in the for loop, and finally writing the footer (</smses>
) and closing the file after the loop. It might be not so elegant as "marshalling the whole thing in one go", but it's certainly the right thing to do in the assumption that the program may be run on databases of arbitrary size (in fact, the 800 MB encrypted backup that I mentioned is a real example from my smartphone; most of this size, however, is, presumably, due to a large number of image attachments that I send through Signal; the total count of messages in my backup is only ≈55k).
Speaking of the <smses count="...">
problem, I would either suggest dropping the attribute count
altogether (not very informative anyway), OR first making a "dry run" (a for loop) on the database to simply count the messages that will be written (no memory allocations in this loop, no operations on messages, only incrementing a counter), then write the <smses count="...">
, and finally re-run the loop while doing actual work (marshalling and writing).
Or you can make it optional with a commandline switch (either make a "dry run" and write messages count, or skip it and drop the count
attribute).
The count
is required by the specification for SMS Backup & Restore (the entire reason for the XML output), so it's probably not droppable.
55k messages is still a lot, particularly since attachments are (or should be) black holed and not loaded into memory after being decrypted.
Thanks for the suggestions. I have a couple of ideas myself that might work, so I'll throw stuff up on a branch when I'm able.
I've pushed a change to testing
that should help to resolve this. Unfortunately my personal backup (only 1200 messages) isn't significant enough to notice a difference, so if you could test this, it'd be appreciated.
@kivisade have you had a chance to test?
Hi. I have a 1.5 GB backup and I'm running into this issue. I've never worked with Go before - is there a good quick primer or do I need to watch out for certain pitfalls, or can I just download those two packages in the readme and compile away?
If you're interested in building from source, yes, all you need is the Go toolchain and dep
, the dependency manager used for this project (both linked in the README).
The current testing branch misses quite a number of optimisations I've made since, but I can bring that up to speed today or tomorrow.
I'd also advise trying the current alpha to see if the issue has already been resolved.
The issue still exists in the current testing branch. And between the time I asked and now, I upgraded from 8 GB DDR3 RAM to 16GB DDR4. So solving it through hardware doesn't seem to be an option, either.
https://github.com/xeals/signal-back/blob/cde5feb8207a0e27a55965973a352467c380afad/cmd/format.go#L137
The approach of first reading the entire SMS database into a slice of structs in memory, then marshalling the entire slice into XML leads to uncontrollable memory consumption, making it almost impossible to use the program on a relatively large message database (parsing encrypted export file of ≈800 MB leads to peak memory consumption ≈ 6 GB on Windows).
XML export should be implemented on a per-message basis (decode -> marshal -> write to file).