suurjaak / Skyperious

Skype chat history tool
Other
349 stars 34 forks source link

Database Merge Bug: Duplicate Groups #40

Closed claws61821 closed 4 years ago

claws61821 commented 9 years ago

Skyperious version : 3.5 Build : 06.07.2015 (Windows 64-bit Installer) OS: Windows 7 Professional x64 Skype builds involved :

JRE build : 1.8.0_51-b16

Description : Merging Skype Main Databases for single user-account resulted in at least one discussion group under that account's ownership with its history available to new members being duplicated in the list of recent discussions. Ensuing messages load randomly into either entry but not into both. Original entry is listed in Saved Groups, while duplicate entry is not. SIgning out, exiting Skype completely, re-launching Skype and signing back in does not fix the duplication glitch.

UPDATE: Closer examination reveals that this issue effects all group conversations started by the account in question since November of 2015. It does not effect group conversations started before then or by other accounts in which the account in question is a participant.

suurjaak commented 9 years ago

I'm sorry, could you explain this in more detail? At the moment I do not understand the situation.

claws61821 commented 9 years ago

Two or three days ago, I merged databases for one account from three devices. Loading the final result back into Skype revealed a duplicate entry in Recent Conversations for one of my Group Conversations. When connected to Skype across multiple devices, new messages both incoming and outgoing appear in either version of the group in a seemingly random manner. When connected only on the device running Skype 7.x, new messages appear only in one version of the group.

Examining the database in Skyperious revealed several more duplicated conversations, with the following in common.

Examining the messages for one of these duplicated conversations by convo_id revealed a non-matching quantity of entries, although I have yet to examine the other groups in this manner or check whether the existing entries all match.

On a hunch, I just switched back to the Conversations Table in Skyperious and examined the id and identity columns for the original separate databases. The id column entries are unique to each database. The identity column entries seem to be the culprit at next glance. Entries for group conversations in question appear as what seems to be a hexadecimal string in the older and mobile versions of Skype (format: 19:[string]@thread.skype), identical between those databases, while entries in the Skype 7.x database use a different format (specifically #[username]/$*T;[id]).

Negative on culprit. At least one non-duplicated group conversation started by this account prior to the release of 6.22.x and which has been active since that time shows the same mismatch in identity valuation formats.

Do you need further information?

suurjaak commented 9 years ago

Thank you, now I get it. Seems that Skype is doing some internal changes, and can have multiple entries for a single conversation, which Skyperious does not know to match.

The difficult part is, whether these old and new entries can be matched automatically.

Looking into it, it looks that the new identity can contain the old identity in the hexadecimal part, in base64-encoding.

Can you try one thing: open a database in Skyperious, open the Python console (menu Help -> Show Python console) and run the following lines, one after another:

cc = db.execute("SELECT * FROM Conversations WHERE identity LIKE '%thread.skype'").fetchall()

for c in cc: re.sub("(19:)([^@]+)(.*)", lambda x: x.group(2).decode("base64"), str(c["identity"]))

This will print out the decoded hexadecimal parts of conversation identities.

Most of the lines are probably garbage like "\xf3m\x1as\xad..", but some of them should be in form #username1/$username2;SOMEID. Are all your duplicated conversation identities present among the latter?

claws61821 commented 9 years ago

The output of the two command lines above is all in the form of your exemplar 'garbage data'. None of the fifty-four lines returned contained the secondary format you provided.

Tried it a second time using copy and paste instead of manual entry, with the same result.

claws61821 commented 9 years ago

My original checks were in the combined database file. I just checked again in each of the disparate database files. Each time the result did not include any entries in the format #username1/$username2;SOMEID or anything similar.

suurjaak commented 9 years ago

In that case, I am at a loss at the moment on how to solve this.

Can you share the appropriate old and new rows from Conversations-table? For example, run a query in the SQL window like "SELECT * FROM Conversations WHERE id IN (id1, id2, ..)", export the result as HTML and send it to my e-mail? Maybe I can discover some connection between the old and new rows.

suurjaak commented 9 years ago

Looking at the data you sent, I can see there is a connection - old rows have a field alt_identity pointing to the new Conversations identity.

So an automatic solution is possible. I will do some work on a version that knows how to handle such split chats.

claws61821 commented 9 years ago

Thank you, Suurjaak. I appreciate it and look forward to a successful outcome.

claws61821 commented 9 years ago

Has there been any progress on resolving this issue in the past fortnight, Surjaak?

suurjaak commented 9 years ago

Yes, thank you for checking in. At the moment I am kept rather busy with other responsibilities, but I'm making slow progress with this on the side.

claws61821 commented 9 years ago

Has there been any significant progress on resolving this issue in the past month, or have you been too preoccupied with your other responsibilities?

suurjaak commented 9 years ago

Unfortunately, I have been very preoccupied with work the last couple of months. Fortunately, this mad sprint will be coming to an end next Wednesday, and I'll have time for more things.

The solution itself is actually nearly done, but I haven't had the chance to test it and finalize it.

claws61821 commented 8 years ago

It's been more than two weeks since you said you would be able to devote more time to this, and I've seen no release or commentary. Did something else come up?

suurjaak commented 8 years ago

Yes, it has been quite a lively time :) Thank you for your patience.

I was finally able to finish the solution, and it looks to be working. However, I have not been able to test it suitably thoroughly, as my own message history has only one such a chat.

Can you try this pre-release http://erki.lap.ee/skyperious/skyperious_3.5.1a_x64.exe, and see if the merge works now? (Please, don't forget to make backups before merging.)

Also, on the chats page, Skyperious should now automatically collate old and new chat rows into one, combining their message history.

claws61821 commented 8 years ago

Am I supposed to copy this into the existing installation, or is it stand-alone. I tried running it stand-alone, and received some loading errors.

[...] Statistics collection starting [...] Error loading additional data from E:[...]\main.db Traceback (most recent call last): File "C:\stuff\projektid\skyperious\dist\build\pyinstaller\out00-PYZ.pyz\skyperious", line 4823, in load_later_data File "C:\stuff\projektid\skyperious\dist\build\pyinstaller\out00-PYZ.pyz\skyperious", line 738, in get_conversations_stats TypeError: can't compare datetime.datetime to NoneType


The above is recorded in the log when loading any database individually, though I only noticed this by seeing the error scroll at the bottom of the window and manually checking the log.

Attempting to compare two databases fails at loading and adds the following to the log, as well as popping up the traceback line in a separate window.

[...] Could not load chat lists from [...] and [...] Traceback (most recent call last): File "C:\stuff\projektid\skyperious\dist\build\pyinstaller\out00-PYZ.pyz\skyperious", line 5945, in load_data KeyError: '__link'

suurjaak commented 8 years ago

Oh, I'm sorry, these were bugs in the code. It's a stand-alone executable. I fixed these two errors now: http://erki.lap.ee/skyperious/skyperious_3.5.1b_x64.exe.

claws61821 commented 8 years ago

Not sure whether the original issue has been functionally fixed, as I examined the output databases without loading them back into Skype itself.

Merging two dissimilar databases which have already had the conversations in question duplicated does not merge the duplicate entries in the Conversations table of the output database. It does appear at first glance to merge them in the results of the Chats tab of the main program*. The SQL page confirms that messages from duplicated groups are still registered to separate convo_id values.

Merging backups of the original database files from July, prior to the first duplication, seems to provide identical results to the above and as before. Conversations are still listed as duplicates in the appropriate table; the tables generate with the same id strings; messages are still split between convo_id strings. Searching for a particular meta_topic on the Chats tab (not table) in 3.5.1 returns a combined result. Performing the same search operation in 3.5 on a database merged in 3.5.1 still returns separate separate results.

You say that you've tested this locally. Did you load the output database into Skype to test whether the relevant conversation was duplicated there; or is this not yet meant to fix that part of the issue?

*Viewing older merged databases known to have the problem in both versions suggests that this is a change in how Skyperious views the database, but not necessarily in how it is written. Not sure how this will effect Skype.

suurjaak commented 8 years ago

This is all as it should be. Skype has multiple entries in Conversations-table for certain chats, and messages in the database can refer to either one. In the Skype program, however, they are shown as a single conversation. This is part of their internal logic, probably to do with Skype/Microsoft migrating to a more server-based system.

This issue was caused by Skyperious not knowing about combined chats and therefore creating duplicates on merge. Evidently one database was from an earlier Skype version with only old-style chats, and the other database had the new migrated structure.

This update in Skyperious does the following:

I tested the merging results in Skype - chose a pretty old backup (from 2009), merged a recent main.db into it and loaded it in Skype. Skype was able to correctly show the entire conversation of a migrated chat from 2009 to 2015, and everything else seemed to be working as well.

Can you try loading your new merge result from 3.5.1b in Skype and see how it fares?

claws61821 commented 8 years ago

Tried loading into Skype the newly merged result from the old backups. Did not open any conversations, but did search for one of the groups I knew to be duplicated. It still shows up in the results list as two separate groups. Likewise, the same conversation still shows up as two groups when loading merged up-to-date databases.

It's possible this is due in part to me loading these databases, which include data from Skype 7, into Skype 6.22, assuming the latter is not prepared to read the data properly.

Switched to my other computer to check Skype 7. Both merged databases still show duplicate groups. HOWEVER, Skype 6.22 seems to have ceased splitting messages between duplicates while Skype 7 continues to do so.

EDIT: Okay, the quaint little feature whereby Github emphasises an entire paragraph when I place a line rule below it was cute the first time, but now it's annoying... and I know you can't do anything about it, so sorry for complaining.

suurjaak commented 4 years ago

Closing the issue, as Skyperious has become obsolete: Skype chat history is no longer available to third-party programs.