tmo1 / sms-ie

SMS Import / Export is a simple Android app that imports and exports SMS and MMS messages, call logs, and contacts from and to JSON / NDJSON files.
GNU General Public License v3.0
360 stars 39 forks source link

Importing vmg files #93

Open buongiorgio opened 1 year ago

buongiorgio commented 1 year ago

Hello, I have found on an old PC a bunch of old smses in vmg format, they are from an old nokia phone I had ages ago. Any chance to import them? Regards.7

tmo1 commented 1 year ago

VMG is apparently an ancient, proprietary format used by Nokia to store SMS messages (see here, here). I'm not going to directly incorporate support for it into SMS I/E, but it may be possible to write a converter to convert VMG messages into SMS I/E compatible JSON. If you're willing to post a few of the files (that do not contain sensitive or private data or metadata), I can take a look at them, and if the format looks simple and easily parseable, I may be able to write such a converter.

buongiorgio commented 1 year ago

It is ancient indeed, some of the sms I have are 16 years old. Attached a few examples with a couple of notes, note that the original file extension was .VMG, but I had to rename them to upload. Thanks and ask me if you need further information. 2.txt 3.txt 1.txt

tmo1 commented 1 year ago

Attached a few examples with a couple of notes

I assume the original files end with the END:VMSG lines, and everything past that was added by you before uploading?

DT is the timestamp: I suppose is UTC

Yes, those are clearly ISO 8601 combined date and time representations, with the trailing Z denoting the zero UTC offset.

buongiorgio commented 1 year ago

I assume the original files end with the END:VMSG lines, and everything past that was added by you before uploading?

Yes, that's right.

tmo1 commented 1 year ago

Okay, I have written (in Python 3) an initial attempt at a converter. It seems to work correctly on the three messages you provided, but I don't know how flexible / complex the VMG format is, so I have no idea whether it will work on other VMG messages.

To use it, put some VMG files in a directory containing nothing else (e.g., vmgs), then run (-d sets debugging output):

vmg-convert.py -d vmgs > converted-vmg-messages.json

  1. If any errors are reported, please post them here, along with the files that triggered them. You can redact the files, but please don't add any notes or change anything more than absolutely necessary - notes should be posted here.
  2. Examine the output and see if it looks right, as far as you can tell.
  3. Try to import the file with SMS I/E (preferably into an emulated device or one you don't mind wiping, since if things go wrong, it can be a pain to track down all the messages and delete them).

One thing I'm not sure about is how to correctly distinguish between incoming and outgoing messages. Currently, we simply assume that if X-IRMC-BOX is SENT then it's outgoing, and otherwise it's incoming, but I'm not sure how correct that is.

buongiorgio commented 1 year ago

Hello, first attempt was unsuccessful due to file encoding. On linux (debian 9.9 with python 3.5.3) the original files are seen as data:

$ file SMS/0001.vmg SMS/0001.vmg: data

On a win10 box, they are seen as UTF-16 Little-endian (according to Notepad).

Converted to UTF-16 Big-endian (on win) now linux likes them better: file vmgs/0040.vmg vmgs/0040.vmg: Big-endian UTF-16 Unicode text

There is an issue 0040.txt with the date: $ ./vmg-convert.py -d vmgs > converted-vmg-messages.json Processing 0040.vmg Traceback (most recent call last): File "./vmg-convert.py", line 58, in sms['date'] = str(int(datetime.fromisoformat(value).timestamp() * 1000)) AttributeError: type object 'datetime.datetime' has no attribute 'fromisoformat'

tmo1 commented 1 year ago

Hello, first attempt was unsuccessful due to file encoding. On linux (debian 9.9 with python 3.5.3) the original files are seen as data:

$ file SMS/0001.vmg SMS/0001.vmg: data

On a win10 box, they are seen as UTF-16 Little-endian (according to Notepad).

Converted to UTF-16 Big-endian (on win) now linux likes them better: file vmgs/0040.vmg vmgs/0040.vmg: Big-endian UTF-16 Unicode text

This is why I really need to access to files that are as close to the originals as possible. I hard-coded an assumption of UTF-16 since on my Debian Sid system, the versions you posted present as UTF-16 little-endian:

$ file 1.txt 
1.txt: Unicode text, UTF-16, little-endian text, with CRLF, LF line terminators

There is an issue 0040.txt with the date:

Processing 0040.vmg
Traceback (most recent call last):
File "./vmg-convert.py", line 58, in
sms['date'] = str(int(datetime.fromisoformat(value).timestamp() * 1000))
AttributeError: type object 'datetime.datetime' has no attribute 'fromisoformat'

It works fine here, without error. FTR, the file presents as:

$ file 0040.txt 
0040.txt: Unicode text, UTF-16, big-endian text

All four of the files I have begin with proper BOM marks: the first three with FF FE (little-endian), and the fourth with FE FF (big-endian). I wonder if the error you're hitting is somehow being caused by your system improperly understanding the file encoding.

tmo1 commented 1 year ago

On second thought, the problem is clearly that your version of Python is too old - datetime.fromisoformat was introduced in Python 3.7. Try running the tool using a more recent version of Python (you mentioned 3.5.3 - that's a rather old version).

buongiorgio commented 1 year ago

Hello, I think the issue was with the python version. I had 3.5 on linux, I tried with the latest (on windows) and I succeeded. I imported a batch of 49 messages, no error during processing. I loaded the resulting file on a Samsung Galaxy S5 with LineageOS 17 (it's a testing device, it was empty) and the SMSes have been imported. They seem correctly handled (UTF chars correctly recognised, timestamp correct, sender correct). I'll try with all the messages and keep you posted. Thanks!

buongiorgio commented 1 year ago

I imported the bulk of the messages. So far, the only issue is with the sent messages: they consider the receiver number as the sender. Take for example the 2.txt file, this is the relevant part:

BEGIN:VCARD VERSION:2.1 N:Nome TEL:+39338000000 END:VCARD

N:Nome contains the name of the receiver (in this case "Nome", this has been redacted) TEL:+39338000000 is the phone number of the receiver (redacted)

In the imported file +39338000000 is treated as the number of the sender.

tmo1 commented 1 year ago

In the imported file +39338000000 is treated as the number of the sender.

I'm not sure what you're seeing, but on my device, it looks correct: +39338000000 is shown as the recipient's number ("To:" in Message Details), not the sender's.