signalapp / Signal-Android

A private messenger for Android.
https://signal.org
GNU Affero General Public License v3.0
25.63k stars 6.16k forks source link

Export Plaintext Backup sometimes creates invalid XML #342

Closed hoozey closed 10 years ago

hoozey commented 11 years ago

For some messages, the 'body' attribute in the xml file doesn't get closed. This causes "Error importing backup!" when trying to import.

Example of bad XML:

<sms protocol="0" address="5558675309" date="1378522000531" type="1" subject="null" body="Hello world  toa="null" sc_toa="null" service_center="null" read="1" status="0" locked="0" />
scw commented 11 years ago

@hoozey can you provide an example body which causes this? I see a catch for IllegalArgumentException which seems to get triggered in the body statement, but an example message that causes the bad serialization would still be helpful.

hoozey commented 11 years ago

One malformed message I ran into was

body="No  toa="null"

Although I have some perfectly fine ones like this:

body="No" toa="null"

Not sure if there was a trailing space in the original message.

I also have a lot of blank ones:

body=" toa="null"
kalaxdg commented 11 years ago

Hi there,

I'm not able to export plaintext at all. Anyone know why that might be? I hit the export button, phone (Galaxy Nexus) works for about 20 seconds and then says "success!" I would expect my phone to take at least several minutes creating a backup. Is there a way that I can fix the issue without losing my archive?

DorianScholz commented 10 years ago

Just got bitten by this:

body="Ja  toa="null"

Only one entry was actually corrupted amongst 2300 good ones. And this was one of about 10 messages that were sent using TextSecure. All others were imported from the SMS app.

Not sure yet what exactly the problem is, so pull request #929 only makes sure the user gets notified about the problem when exporting and not only when importing, because this might be too late...

DorianScholz commented 10 years ago

So, I think I have found the problem. Icons are represented in the message body by unicode chars in an invalid range according to: https://android.googlesource.com/platform/libcore/+/master/xml/src/main/java/org/kxml2/io/KXmlSerializer.java line 128:

boolean valid = (c >= 0x20 && c <= 0xd7ff) || (c >= 0xe000 && c <= 0xfffd);

I've added a commit to #929 to replace these chars by spaces before export. This removes the icons, but preserves at least the text of the messages in the plain text backups.

MacaGovani commented 9 years ago

I had the probelm with TexSecure 2.1 and this new 2.3.3 still has the problem. and its not fun, my job requires i sometimes have to back up comminications of a he-said/she-said .
body="pt toa=" I found this, by search with notepad, from the hints of above user/developers in this thread, with the search string xxtoaz where the "xx" are two spaces , and the "z" is a "=" (equal sign) shown anohter way with an exatra quote mark on each end " toa=""

MacaGovani commented 9 years ago

Youwser, lots of manual fixing for me to do.. this XML is 4966 messages. about 200 messages have body=' (the single quote marl) the message is also in each case very long. and at the end of the body, the error is repeated with a ' single quote.

one of the bad messages also was like this body='hello all good citizens' ?' toa= (see the exta space, extra question mark, and extra single quote mark)

McLoo commented 9 years ago

@MacaGovani can you post a sample message that leads to a wrong XML file?

MacaGovani commented 9 years ago

Hi,

sorry for the delay,

the original file is 4966 messages I scrubbed about 95% of the messages by count from this file. as a lot of is private in nature.

the phone is Samsung galazyII

this website had a sample sms-xml file (also attached) http://android.riteshsahu.com/apps/sms-backup-restore

go to the ends of the lines (one line per sms message) and note that this sms has several more variables than the file I get from textsecure

and this zip file, from teh same web site, if it may be of use, has some same style sheets that can help show sms-xml files in excel, but excel says it cant show my file from text secure, with a "parsing" data error.

excel does work for the sample xml file.

when i trim my textsecure .xml file down to just a hundred sms messages, with none that have body=' (single quote), then excel can open and view the file.

but i have not gotten the style sheets to work for me yet, I have never used style sheets yet, and some text needs to be put in the xml file (near the top) to call out to use the style sheets of .xsl .xsd and .css I think the style sheets even have the unixepoc date converter.

There is also a free JAR program that can view sms-xml files (its very crude), but it also would view the sample xml file, but not my textsecure xml (it errors out on load attempt), even when i cut it down to just two messages.

so then I looked at the last 3 varaibles per sms line (per message, at the end of message) in teh sms-smaple file, and added those same variables to my two sms lines , with dummy data, and it worked.

in theory, i suppose, one mmore thing for me to try, once i get all the body=' messages fixed or removed (many many of them, its over a year of texting.. 4966 messages)

and possibly use notepad.exe to find and replace command to add the dummy varaibles and data at the end of each line.. it may work in a viewer.

bu clearly a long term fix is needed for the body='

it seems to be trigged whenever a message has used double-quote marks.

Scott

On 12/12/2014 02:08 PM, McLoo wrote:

@MacaGovani can you post a sample message that leads to a wrong XML file?


Reply to this email directly or view it on GitHub: https://github.com/WhisperSystems/TextSecure/issues/342#issuecomment-66842971

novoid commented 9 years ago

I can provide examples as well.

My Emacs nXML mode notifies on malformed XML:

I get "Invalid Code" on the first "5" in following message which might indicate some kind escaping issue with Emoticons I used in this message:

body="&#55357;&#56843;"

In TextSecure, this message was basically only a single emoji. Don't know what the second number stands for.

With my current TextSecure, the issue with missing closing quotation marks seems to be gone.

Sorry, could not find out my version number in the App. But it's not 2.3.3 since I got the update suggestion right now :-)

McLoo commented 9 years ago

@MacaGovani thanks for your explanation. The fields readable_date and contact_name are optional in the xml file.

I'm pretty sure excel complains about an invalid Unicode character with error number -1072896737. (Details of XML-Error) This is because the escaping of Unicode characters. sth like &12345;

what is more of concern to me: the body=' issue Can you post a sample message here? Doesn't have to be a real message, just one forcing the issue.

McLoo commented 9 years ago

@novoid We have to do this kind of charter escaping to keep XML Backup & Restore compatibility (see https://github.com/WhisperSystems/TextSecure/pull/1379#issuecomment-41662033)

novoid commented 9 years ago

OK I see. My own read-in-XML-backup-and-convert-it-to-Emacs-Orgmode works with the current format. Thanks for fixing the missing quotation marks!

MacaGovani commented 9 years ago

sample of actual text-secure output, with some of the message lines, with body=' ,
rather than expected body="

appears to have root cause of message with quotes in the message. here is a 6 sms message sample xml (if there is a fix already published, where is DL , on the whisper web site?

<?xml version='1.0' encoding='UTF-8' standalone='yes' ?>

MacaGovani commented 9 years ago

sample of 3 messages (the bad xml) all the personal info has been faked to non-real persons & phone numbs

<?xml version='1.0' encoding='UTF-8' standalone='yes' ?>

MacaGovani commented 9 years ago
McLoo commented 9 years ago

@MacaGovani thanks for your reply. I fear you need to repost one of those XMLs between 3 back ticks before and three after the XML:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> ... YOUR XML INSIDE HERE

And please also post the text of the such a message, not the only the bad XML file. Or maybe a screenshot of the text.

And by the way: what version of TextSecure are you using? All attributes in the XML file should be in double quotes, not single quotes.