New parser to decode Teams conversations

sepinf-inc / IPED

IPED Digital Forensic Tool. It is an open source software that can be used to process and analyze digital evidence, often seized at crime scenes by law enforcement or in a corporate investigation by private examiners.

Other

990 stars 221 forks source link

New parser to decode Teams conversations #1680

Open lfcnassif opened 1 year ago

lfcnassif commented 1 year ago

Recently this was needed by a colleague. These 2 libraries/projects MIT licensed could be used: https://github.com/cclgroupltd/ccl_chrome_indexeddb/ https://github.com/lxndrblz/forensicsim/

patrickdalla commented 1 year ago

Hello,

I found this, already in Java: https://github.com/dain/leveldb.

patrickdalla commented 1 year ago

This seems a good tool to evaluate and compare also: https://github.com/amyboyd/indexeddb-to-json. It was the only one of the above I could get textual info easily (JSON format) running on docker image of the used Puppeteer module https://pptr.dev/guides/docker. It uses and commands an embedded chrome installation to read and export the indexeddb in JSON format. The second, in java, seam the more clean installation. But does not implements V8 or Blink object deserialization. The first one uses python, and I could not run it quickly as I am not used to Python. It seems the more complete, as it promises to indicate even the offset of each identified extracted information.

gfd2020 commented 1 year ago

@patrickdalla , thank you for the info. I will verify.

lfcnassif commented 1 year ago

Some news about Teams new version, not sure if it already changed or if it will at some point: https://office365itpros.com/2021/06/25/teams-2-webview2-replaces-electron/ https://techcommunity.microsoft.com/t5/microsoft-teams-blog/microsoft-teams-advantages-of-the-new-architecture/ba-p/3775704

The database can possibly change...

gfd2020 commented 1 year ago

Some news about Teams new version, not sure if it already changed or if it will at some point: https://office365itpros.com/2021/06/25/teams-2-webview2-replaces-electron/ https://techcommunity.microsoft.com/t5/microsoft-teams-blog/microsoft-teams-advantages-of-the-new-architecture/ba-p/3775704

The database can possibly change...

:-(

patrickdalla commented 1 year ago

AFAIK, main cache data from teams are stored on indexedDB as javascript objects serialized in V8 engine format. It seems to me that it would be very useful if the parse can be done in two stages: one extracting each indexedDB record as a subitem, with the content transformed to a JSON, and the second stage, to parse specific teams objects to mount the conversation in a more readable format. The first stage would extract more items than only teams related. The second stage could be run against the teams folder item, in the parsing second process pass. So a lucene query could be run to get the correspondent teams json record need to mount the chat. Maybe this second stage could even be impĺemented as some Viewer, to mount the chat at analysis time.

gfd2020 commented 5 months ago

I had an initial implementation to decode Teams chats in Python. However, I tested it on the new version of Teams and it didn't work very well. I believe the python implementation will need a lot of work to make it work. I'm thinking it's better to move on to another purely Java implementation of leveldb. I'm still searching.

lfcnassif commented 5 months ago

Thank you @gfd2020 for taking a look at this! Maybe the new ChromeCacheParser implemented by @patrickdalla (derived from the original DiscordParser by @felipecampanini) could be useful, since it deals with Chrome indexDb databases.

patrickdalla commented 5 months ago

In fact it was felipe work. My work was to separate the chrome cache parser from discord, making the first more generic and reusable, and the second processing the results from the first. Take a look.

Em qui., 13 de jun. de 2024, 07:26, Luis Filipe Nassif < @.***> escreveu:

Thank you @gfd2020 https://github.com/gfd2020 for taking a look at this! Maybe the new ChromeCacheParser implemented by @patrickdalla https://github.com/patrickdalla (derived from the original DiscordParser by @felipecampanini https://github.com/felipecampanini) could be useful, since it deals with Chrome indexDb databases.

— Reply to this email directly, view it on GitHub https://github.com/sepinf-inc/IPED/issues/1680#issuecomment-2165378788, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG247S7SQTH5HOWGNILWKL3ZHF6XHAVCNFSM6AAAAABJIFSWG2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRVGM3TQNZYHA . You are receiving this because you were mentioned.Message ID: @.***>

gfd2020 commented 2 months ago

A new version forensicsim has been released. I tested it and it appears to work on both the old and new versions of Teams. I will try to implement a viewer in IPED for the data made available by the parser.

lfcnassif commented 2 months ago

Thank you very much @gfd2020 for helping with this!