lilium895 / Telegram-chat-restorer-pyautogui-version

0 stars 0 forks source link

images, audios and stickers #2

Open fernlobo opened 2 months ago

fernlobo commented 2 months ago

Do the images, audios and stickers stay correct this way? I'm desperate, I lost more than 40GB of a chat that is extremely important to me. But luckily, I have the full bakcup of it in HTML from the Telegram export.

You explained that it is an extremely time-consuming procedure, right? There are 2 years of conversation, with many audios, videos and more than 10,000 images. Maybe I can't do it.

Do you know of any Telegram HTML backup reader/viewer? To at least be able to access the HTML backup in an interface more similar to chat.

I know it can be opened in the browser, but the interface is terrible. The messages mentioned do not appear above the replies and it becomes very confusing and it is no possible to go to specific date.

I just wanted to be able to read in a decent way whenever I needed to. There is an extremely important emotional value for me and the person. And I accidentally deleted the chat.

Please, help me!

lilium895 commented 2 months ago

Hi, I had a lot of messages, photos and videos too, it's doable. First of all if you just want to see the chat as you did on telegram but on the web browser you can do it by doing the follow: Do a "back up" of the chat with telegram desktop. In the folder ChatExport... of the back up you will see all the sub-folders css,images,stickers,js,videofiles.... You need just to paste your images, chats, videos, stickers in the right folders. Then when you open a chat, you should see the chat as you did on telegram on the browser.

If you want to recover the chat inside the telegram app, you could try my pyautogui app. Yes, it requires some time to do it and it has some issues (because it's hard coded but I willingly help you out if you'll have any problem). But it works. If you know some basic coding you can easily code it yourself. You just need a laptop and an internet connection. The better your hardware and connection is, the faster it will finish.

I don't remember about the replies system, if they work using the browser. I didn't find a way in my application to visualize the reply. I've only done a message before the message that says "in reply of message -&number", so that you can count &number messages before that.

fernlobo commented 2 months ago

I didn't understand when you said: “You just need to paste your images, chats, videos, stickers into the right folders. Then, when you open a chat, you should see it as you did in Telegram in the browser.”

I know that the chat is already exported in subfolders and, in a certain way, organized. You can open it as HTML in the browser and read it in order, viewing the media, etc.

2024-08-20_10-09

But the problem is that I think it's a bad interface. Mainly because the replied messages don't appear throughout the chat, so I think it's confusing to understand when you read it again, you know? I'll send a print for you to see.

2024-08-20_10-03_1

I wanted to be able to read it in the most similar way possible to chat, seeing the replied messages above the replies. I also wish it had a serching option, or a calendar, something similar to go to a specific date.

That's why I asked if you knew of any tool or software to read these bakcups in a better interface. Or if there is a possibility that I can read these HTML in a chat template or something like that.

If something like this doesn't exist, is it something complicated to create? I mean, replied messages have a link, wouldn't it be possible for some software to automatically paste these messages in place of the link?

Do you understand?

Your pyautogui app seems to be complicated to me because I don't have any programming knowledge. I would even be willing to try, but as it's a huge chat (more than 40 GB), I'm afraid it would be practically impossible.

That's why I asked about a reading tool or software, because it would help a lot and make me feel less sad.

Thank you very much for your attention and for responding to messages, it means a lot to me.

lilium895 commented 2 months ago

I meant as you have already done. In some case people have only the html files and media and not the folder js and css. Those allow you to read the chat.html in the browser.

If I remember correctly if you click to this message, it should take you to the message that has been replied to. In my knoledge there is no recovering application with the feature which make you read the replying as in telegram. Maybe one like this https://github.com/TelegramTools/TLImporter in which telegram bot are used to re-send the messages in the chat, but it can't recover media.

When I built my app I was thinking that using the browser to read the messages was boring and slow and it would be better to read them from the telegram app. In my knowledge (maybe there are new apps out there that allow it) there are no recovering app that have the same, exactly style as the original chat.

Replying My app allows to recover media and the messages in the telegram app, but it has not the reply link feature as you could normally see in telegram. As I said before instead of having the replied message above the replying message, there is only above it: "in reply of message -&number".

Calendar As you can see in ferfega project https://github.com/TelegramTools/TLImporter, messages have a date and a time behind them. By using the search bar that normally telegram chats have, you can easily look after the message in the interval of time you want.

If by tool/software you mean like an executable, I couldn't do that. People have slightly different html chat files, and the conversion to text wouldn't be always correct. It's very easy to change the code though. There are also some steps you have to do manually before running the program.

I've look after the memory space of my chat. Even though my chat was also about two years of conversation, it is only 10 GB. Media requires longer time to be sent in my app. It's still doable.

The style of the chat will be like the one you can see here: https://github.com/TelegramTools/TLImporter, plus the media and the stickers.

If you want to proceed, it would be helpful to know which laptop you think you will use, the speed of your internet connection and if you have another laptop in case you need it for work. If you want to try this way, tell me. So I'll slightly change the code for your particular html files.

Don't worry to ask for more, I'm happy to help. And sorry if the program isn't perfect. I'm just an amateur.

fernlobo commented 2 months ago

Ok. I'm wanting to test your method. So, I exported a small chat from any contact in HTML and deleted it after that.

Following the first step, I was trying to convert HTML to Whatsapp text format using the "html_to_txt_pyautogui.py" script in the same directory of HTML.

2024-08-20_15-49

However, it returns the following error:

2024-08-20_15-51

I tried chatgpt help but couldn't understand how to fix.

lilium895 commented 2 months ago

I'm working on that. I don't think chatgpt would be useful. If an error occurs just send to me the error. I'll probably ask you to send me part of the html code. Be careful about sensitive information.

Can I ask you how much python do you know? So I'll know if I can give you istructions or not.

Most important things I need to know: How much RAM, which CPU do you have? Do you have an HDD or SSD? How fast is your internet connection? Do you need the computer for working or it can run the program for a long time? Can you use another computer while that computer is busy?

fernlobo commented 2 months ago

I confess that I know almost nothing about Python, I'm just curious and I'm trying to understand more or less your app. I don't even know if I can.

My pc is ok. It's not the best, but it's not bad either. It has 16GB of RAM, I use a WD SSD and my internet has a download of 600 Mbs, but the upload is terrible, just 100 Mbs.

I don't intend to do it in the main and big chat. I just wanted to understand the procedure more or less, so I would take a very small chat from my own contact but from another number I have.

Yes, I need the PC for work/study, I even have another PC, but not as good... However, as I just want to test the method in a small chat, I don't think it would take long, right?

lilium895 commented 2 months ago

Yeah, if it's just a test there is no problem. It should be fast. I've modified html_to_txt_pyautogui.py, it should be working now. The pc is perfect. Estou ansioso para ter um comentario seu. If you can send me a feedback about the text file, I'll appreciate it. Does it look like the one posted in README?

I've modified the README, I forgot to mention to install some libraries from prompt.

fernlobo commented 2 months ago

I reread the README and installed the libraries.

Downloaded the modified html_to_txt_pyautogui.py (Would this file be used for any chat or did you modify it for this specific one?)

Now it worked

image

Then I went to the second step, I put all the media in the "allmedia" folder.

One question about this: I simply copy everything in the files and images subfolders and paste it into "allmedia", right?

After that, I opened the "pyautogui_recover.py" script through syper and I'm making the changes where IMPORTANT is indicated.

In this part I was unsure about "last number that appears in _chat" Do you mean in the file name chat1.txt?

And I also didn't understand "but with +1"

Would it then be 11?

image

lilium895 commented 2 months ago

html_to_txt_pyautogui.py will be used for all your html chat files. Since you are helping me out by reporting your errors, I'm trying to add to the program all the possible scenarios. So if another person wants to try the program, it will work.

Have you looked if the _chat1.txt file looks like the one I've put in README?

In "allmedia" you have to put all the media you have sent through your chat history, that means all the files in these folders: photos files video_files stickers voice_messages I'm not sure if you have more. From the photos you have sent, those should be what you need. Css, js, images folders won't be used.

Yes, I meant _chat$.txt. In your case you have only _chat1.txt, so you have to put 2.

In the second step sometime pyautogui opens a file and blocks the recover. In my recovery it happened some times, but at the end the program ignored it and continued to back up the messages into telegram. REMEMBER to put and hold the mouse to the top left corner of your desktop to stop the program whenever you want. Like if you want to stop it earlier because you have now an idea on how it works. Or a message wasn't sent or other errors occurred.

fernlobo commented 2 months ago

Apologies for the delay. These days I've been looking for a way to combine all the HTML files generated into one (there are 652 in the chat, do you know how big it is?). After they were merged I wanted to convert it to PDF, keeping the links to the media and the replied messages.

When converting HTML to PDF using foxit pdf editor or wkhtmltopdf, the links to the media remain functional (it opens the file when I click), but not to the replied messages.

I wish I could at least click in "this message" and go to the message, like in the browser.

image

In the browser it is very difficult to read, as you already said. In addition, all HTML files are not together (It's not very useful when looking for something in specific in the chat)

Using your method, replied messages also do not appear, as you already said.

In the PDF, I can generate a single file (although a huge one, i'm still trying to merge them all), I have the link to the media, but I need to find a way to keep the link to the replied messages.

Do you know a way to convert from HTML to PDF keeping not only the media links but these links to the Replies messages?

lilium895 commented 2 months ago

Can you send me the html to pdf program you have found? I'm thinking about a solution in which if we can change the original html file, and we modify it in a way that the program would read "in reply of ..." as a hyperlink, maybe it would convert it in a pdf that corresponds to your needs.

I'm not familiar with these tools. But I'm going to give it a look, just to help you out. I also like the idea of having a pdf to read the chats. Does the photos show in the pdf or your have to click it every time? Does it work with any kind of media?

Have you tried to back up the chat with my program? Does it look good? I was thinking that if I could learn how to create bots, we could add some compensatory feature like: the calendar, by adding a bot that send the date in the beginning of the day; the replies, a bot could add a message above the replying message with the link to the replied message. I don't know how to use telegram bots and it will probably be a long project. So I suggest you to try your htmltopdf project or my pyautogui for now. I've written these ideas down in case anyone reads them and he's willing to help.

fernlobo commented 2 months ago

Thank you very much. You are a very kind person. I'll try to explain what I did. As the chat was huge, the backup generated 652 HTML files.

image

So, first I combined these files into one using the command line CD C:\ Path os the HTML files copy *.html Merged.html After that I used the wkhtmltopdf program It is an open source command line tools to render HTML into PDF After insalling it cd C:\Program Files\wkhtmltopdf\bin wkhtmltopdf --enable-local-file-access --load-error-handling ignore --javascript-delay 2000 --enable-internal-links --zoom 1.0 --no-stop-slow-scripts "C:\Users...........\Chat Telegram Exportado 08.06.2024\Merged.html" "C:\Users.....\Desktop\Merged.pdf" Generated a PDF of 26126 pages to give you an idea of the size of chat.

All links to media (audios, gifs, videos and images) work perfectly and when clicked they open the file according to the path they were in my PC when the PDF was generated. But the links to the replied messages didn't work

After analyzing the HTML file, I found that it follows the following structure, for example:

<div class="message default clearfix joined" id="message723658"> . . .

<a href="#go\_to\_message723658" onclick="return GoToMessage(723658)">this message</a>

On a forum I saw something like to remove this part “go_to_” and replace with only “#message723658”

So I opened the HTML file and automatically replaced all the parts to just <a href="#message.

It stayed like this

<a href="#message718640" onclick="return GoToMessage(718640)">this message</a>

And it worked!!!! When I converted this Modified HTML file, "this messagem" on PDF is clickable and goes to the place on the page where the replied message is

But it doesn't mark the message, it simply moves to the location on the page where the message is... and I can't specifically identify which message was replied...

In PDF-XChange Editor I can check the link settings and see that it actually mentions the message numbering, but I wanted there to be some action to highlight the message when I clicked "this message", so that identification could be possible....

image

I don't know if you understood.

I tried using your program, but it didn't work. That method of cloning Telegram is no longer working, but I managed to overcome this by downloading Unigram and logging into two different accounts (desktop and unigram).

The problem is that I remembered that you said it doesn't work with replied messages and the chat I missed has a lot of replied messages.

I even tried using pyautogui_recover.py, but some errors appeared, so I think I didn't make the correct changes on thepy file.... Maybe it's too complicated for me and impossible since the chat is huge and you said it takes a long time..

It would be very interesting to create a bot and add these features, replied messages and calendar... Many people would definitely use it... I discovered hundreds of people who accidentally deleted the chat and couldn't find solutions on the internet...

There is really a lack of tools and software to handle telergram backup

What is bizarre, since so many people in the world use the app... I think it's very insecure... There should at least be the chance to recover the chat accidentally deleted in a reasonable amount of time... an hour or something... it's sad

If I can at least highlight the messages that are replied to in the PDF, it will be a victory... and I kept the backup in the hope that someday an application or tool will appear that transforms the files into something similar to chat or imports them into Telegram from a quick and simple way

lilium895 commented 2 months ago

Good job for the results you've got by yourself! I've tried your method and it simply works. When you click the link of the replying message you can identify the replied message because it is the one on top of the pdf page. That's not the case if the replied message is on top of the pdf file, the first page. I haven't find a solution to the highlighting problem. Highlighting is done thanks to javascript functions which give that "action" you seek. Those aren't imported to the pdf when converting (It's my guess, but I have tried with multiple web pages and those highlighting/underlining/animation weren't imported). Even the citing format, like the one in wikipedia pages in the bibliography, didn't work for me.

{{{ Don't worry if you don't want to use my app, you've already found a great solution by yourself. Just in case you want to re-try and have a copy on the telegram app, I'll give you some instructions:

Have you replace the \ in the path file with \? Have you removed \_chat{number}.txt or \{media_name} ? Have you replaced 'number of text files' with the correct number? In your case 2 (as we said for you test, 653 if you want to do completely). You have to remove all those comments, between "...." and "...", because pyautogui gave me errors if comments were still there. Have you replaced the "Sender name" variable with the name of one of the two senders AS it is in the _chat.txt?

It should take you 14 days to recover all the chats. Your computer is better than mine, maybe you could change the time in time.sleep() functions in pyautogui_recover.py to make it faster. It's a simple task, you have just to try and correct until you find the best timing. While you try you can still recover. Cancel all the html file you have already done and the text in the last html file until you reach the exact message and go on. }}}

I think you have already done a great job! Thank you very much for connecting and giving your results. I will try myself your method. I'm not sure if I can still be helpful. My last suggestions are:

Try other htmltopdf converters. If you are lucky you can find a program with more features about javascripts. If you could find an easy way in which pdf normally highlights text while scrolling on them, you could find a correspondence to html. I mean "easy way" because the converter program must convert the html text to that specific tool. If that specific tool is only for adobe pro or the converter program hasn't it because it's not upgraded, you wouldn't find it. I've tried to look after citing, links which normally pdf uses, but I haven't found one that highlight after clicking. Maybe I have done an approximate job, I suggest to try it yourself.

I'll leave this issue open in case you need more help or as a think tank. Also I would like to know if you'll get some results.