monicahq / monica

Personal CRM. Remember everything about your friends, family and business relationships.
https://beta.monicahq.com
GNU Affero General Public License v3.0
21.3k stars 2.13k forks source link

Working on a conversation importer for Monica. What are your thoughts/needs? #1901

Open pocketarc opened 5 years ago

pocketarc commented 5 years ago

I'm working on a set of PHP packages (accessible as a library and as a CLI tool) that allow importing conversations into Monica. Most of us carry out a ton of conversations over multiple IM services, and it'd be great to bring them all into Monica. The goal, ultimately, is to have a system that you can run on a regular basis (I'd like to do it nightly) in order to always have everything stored perfectly. I have 14 years of chat logs carefully collected, in different formats (hello MSN!), and I will be creating importers for -everything-.

I'm curious as to what everyone's thoughts are? I'm looking to build importers for as many services as possible, and for stored chat format. If you have any suggestions, let me know!

Planned Importers (feel free to suggest your own)

How it works

You'll enter your API details, the importer you want to use (e.g. Skype), the details necessary for that importer (in Skype's case, it tries to auto-detect the path to your Skype DB, or you can enter your own), and it'll go through all your messages, importing them all.

  1. Conversations, by default, are split in 2 hour chunks. If it's been longer than 2 hours since the previous message, it's a new conversation. This is user-configurable, but I think 2 hours is appropriate.

  2. The CLI tool will ask you to match contact names to their contact ID. It will fuzzy search for the contact with the most similar names, and show those as options (or let you pick your own). This data is stored in a cache file so that it can be reused automatically when you run the importer again.

Feedback for devs, so far:

The API documentation doesn't describe a way to add conversations or messages. I assume this is because it's new? Right now I'm manipulating the DB directly, but I'd love to do this via the API, for people who aren't self-hosting.

There's also no support for inserting any styling or media (photos, videos) into the conversations, which is a bit of a nuisance. For now, I've been importing everything as Markdown, but it'd be good to have a further look at this. I'd be happy to build in support for this and create a PR for it (with S3 in Monica, it'd be a piece of cake to permanently import all media).

djaiss commented 5 years ago

I'm curious as to what everyone's thoughts are?

To summarize my thoughts: 😄 ❤️ 😍 👍 💯

So basically, you want to use the conversations feature for this. I think it's an amazing idea, exactly why I did start writing this feature to begin with.

14 years of chat log is a huge achievement. That's a lot of trust in Monica!

The API documentation doesn't describe a way to add conversations or messages. I assume this is because it's new? Right now I'm manipulating the DB directly, but I'd love to do this via the API, for people who aren't self-hosting.

Yes 😀 The API is completely functional and done, I just didn't have the time to create the documentation yet. I'll start it as soon as possible, it shouldn't be too long.

There's also no support for inserting any styling or media (photos, videos) into the conversations, which is a bit of a nuisance.

Attaching media is a totally different story, especially for the hosted version. I'm working right now on a Document feature to allow uploading of documents, but for the hosted version, I need to put something in place to prevent the costs to bump up drastically. We have around 14 000 users right now and if people start to backup everything on Monica, we are financially dead 😀 We don't have a choice to put limits in place and to ask people to buy the paid plan if they want more storage. This won't be an issue for self hosted.

For now, I've been importing everything as Markdown

Right now we don't have support for Markdown in the Conversation class, but it's like a 10 min change to make it functional.

To summarize for what we need to help you:

That's a very exciting project.

pocketarc commented 5 years ago

That's awesome, I'm glad I'm not the only one excited about this! So far, I've got a Monica API client package (uses iterators to allow me to loop through items without any special code to load next/previous pages), the Monica Importer CLI tool, and the Monica Importer for Skype (that was just my test implementation).

I've made it so that it's as easy as possible to implement new importers. They just need to implement MonicaImporter, and when returning records, each record just needs to implement MonicaMessage. Beyond that, everything else is handled by the importer tool (matching contacts to Monica with fuzzy search, caching, combining messages into conversations, etc), so new services don't have to do almost anything. In Skype's case, it's just "open DB with PDO and return PDOStatement", since PDOStatement is an iterator. MonicaSkypeMessage takes in a row from Skype's DB and processes it into a Monica-suitable format.

As soon as I have a bit of free time, I'll wrap this up nicely, create documentation, and put it up on GitHub. The importer can be used via the CLI but it can also be used as a normal library, so it should be possible, in the future, to composer require and use in Monica.

djaiss commented 5 years ago

Awesome. On my end, I'll try to have the API documentation live in 1-2 days from now.

djaiss commented 5 years ago

@BrunoDeBarros https://www.monicahq.com/api/conversations I've added the first part of the documentation. Some methods are still missing though, it'll come 😀

pocketarc commented 5 years ago

@djaiss Thanks for that! I actually hadn't realised that the Conversations and Messages endpoints where under the Contact folder; I just didn't see them in the main Controllers/Api folder and so assumed they were unreleased!

There is one thing: In your "add a message to a conversation" documentation, you mention that account_id is required, but it's not; in the source you grab it from auth()->user()->account->id. It can't be edited via the API. The API itself seems to be correct, though, so it looks like everything should work.

One more thing: Is there a way to include contact fields when listing all contacts? I want to be able to grab people's IMs (Skype, email, etc) so that it's easier to match them up when importing conversations. Having to make 1 API request per contact means it's easy to break the 60 requests per minute (whereas using the normal contacts endpoint is only 1 request per 100 contacts).

djaiss commented 5 years ago

@BrunoDeBarros thanks for pointing out the errors. I'll fix them.

I actually hadn't realised that the Conversations and Messages endpoints where under the Contact folder

Yeah, because I didn't see the point of letting users create conversations not linked to contacts. Perhaps it's a bad architecture decision, I don't know. This is the first API I create myself, so I wouldn't be surprised if some things are not ideal.

Is there a way to include contact fields when listing all contacts

No, but I can probably add a new condition in the with parameter to make it happen. It'll have to wait till the weekend though, I'm not sure I'll have the time tonight.

pocketarc commented 5 years ago

Thanks @djaiss! There's no rush, I've got my own work as well, I'm just doing this in my spare time. I'm working on creating a proper PHP-based client for the Monica API as well. Right now I'm not baking it fully, only enough for my needs, but it should be easy to grasp, and I will be sharing it as soon as I get a chance!

tbirrell commented 5 years ago

A good importer to have would be one that can handle Google Takeout. It might not be automatic, but getting years of Google Hangouts history would be nice.

pocketarc commented 5 years ago

@tbirrell Is Google Hangouts text-based? Automatic doesn't really matter to me, since most of my imports will have to be manual (MSN, Yahoo, AIM, etc); the importers are built to handle manual sync first and foremost.

tbirrell commented 5 years ago

@BrunoDeBarros, yeah. the Hangouts part of Google Takeout is a JSON dump of Google Chat/Hangouts

pocketarc commented 5 years ago

That's brilliant! We'll have to have a look at that then; I might need a sample of the JSON files, since I've never really used it, but it should be -very- straightforward.

Edit: Later on, once I've got the rest of the library ready!

Leopere commented 5 years ago

Discord has a fantastic API should support Discord. Also potentially email via SMTP/IMAP/whatever somehow?
I'd suggest android integrations but this may be the wrong repo to post to for this reason. NextCloud integration would also be really great.

pqhf5kd commented 5 years ago

How about Facebook HTML export? I can provide an example.

TomGranot commented 5 years ago

@BrunoDeBarros If you need documentation for the work you're doing, I'd be happy to jump in two-three weeks from now. Just let me know!

As far as importers go, a GMail one would be amazing. Specifically, if it'll let me decide which conversations to add and which not to, it could help (indirectly) with letting me input new contacts into the software.

ivankruchkoff commented 4 years ago

Has this made some progress?

pocketarc commented 4 years ago

This -has- made some progress, but it's been slower than I would've liked. I'm hoping to have an update in the next few months. 😊

djaiss commented 4 years ago

Or if you want, you could publish it so we could finish it 😀

RCheesley commented 4 years ago

I'd love this also, as many conversations happen on Slack/email/Discourse forums and I would have to do a lot of duplication to bring everything into Monica. Currently I'm just adding a high-level overview of conversation topics, and manually adding new users I want to remember to follow up with in the future.

To have a 'who have I talked to today/this week/this month' feature would be awesome, maybe something that pulls the data in from the various sources and allows you to choose what to import.

pocketarc commented 4 years ago

Yeah, this is exactly what I want; there's a lot of integrations I've had in mind for Monica that go beyond just conversations, but conversations are definitely the biggest item. I will have more free time going forward to work on this, because it's super important for me and for all of us.

For now, just make sure you don't delete any communications/chat logs. 😉

noharm1010 commented 4 years ago

Great progress for future use :-) A search option within might be very useful.

nomatica commented 4 years ago

Just wanted to add my voice of support for this!

nomatica commented 4 years ago

It would be great if there was some sort of clipboard importer. After a conversation in a specific chat happens, clip and send to contact in monica.

lyz-code commented 4 years ago

It will also be handy to have an importer for jabber (xmpp) and Whatsapp.

I know nothing of PHP, so I'll probably develop something in Python. Will let you know here if I finally do.

dan-kez commented 3 years ago

This would be excellent. Automation like this would help immensely for keeping up with data entry.

pocketarc commented 3 years ago

Funny that you replied to this now, because I got back to working on it yesterday, after not having done anything on it in ages. Was forced to because of getting a new Mac, and I'm a bit scared of losing the old messages. Should have some news fairly soon!

johnny-y-wang commented 3 years ago

Sounds like a super useful tool to have! If you want to open source this, I’m sure plenty of us are happy to pitch in :)

GlassedSilver commented 7 months ago

Any update on this? This would be a major reason to switch to just keeping notes in the notes field of my contacts app to using both.