[Feature Request] Multiple features

0xEnders commented 9 months ago

Hi again! Thank you for the quick updates and implementation of features. I have tested the V0.3.0-dev build and everything works perfectly!

I have a few more QOL requests if you you think its useful

Translation : Will it be possible to add a translation function to the listener? Understand that it might be slow if you pipe from telegram > translator > discord so it could be optional. Could be like how you made it for OCR, e.g [TRANSLATION] enabled=true

message ===TRANSLATION=== translated message

Cut db file every day :

I am not sure if this is possible, but could there be an option to cut the .db file every day so we can run our own scripts to export or mail it somewhere else? for example something like [EXPORT_DB] enabled=true time_to_cut_file=0000H GMT+0

config_file for exporting of db to csv : An option to export the sqlite3 db to csv with our custom headers. For example now the telegram_message csv starts with id, group_id , media_id etc. Is it possible for us to use another config file to choose what to export? E.g,
export config file

[EXPORT_CSV] header=group_id header=group_username header=title header=message

Thank you once again for your hard work!

guibacellar commented 9 months ago

Hi. I'm Happy to know that you are testing the 0.3.0 :D

About the features:

1-) You tell the translation, translation? Like, ENG to PT_BR or RU to ENG?

2 and 3-) I'm planning a auto export already, probably because the same problem that you are facing. Ref: https://github.com/guibacellar/TEx/issues/53 I'm really understand you, lol, currently my TEx setup uses about 300 GB (Yes, I download everything) and my data_local.db has 3,5 GB LOL-LOL-LOL. And i keep only the last 30 days on the DB. Also, I think on export specially to use ML and Gen IA to cluster and analyse all the messages.

I'm prioritizing the export feature to 0.3.0 official release. 😄

Also, two weeks ago, was impossible to add this features, but, to address one of your request (and again, lots of thanks for that) I fix the way how I work with the asyncio and the telegram client loop. 👍

For the end, I'm aware about the pain of keeping running the 'purge_old_data' command in order to keep the local database away from the meltdown. For that, on 0.3.0 I will introduce the Automatic Maintenance#43.

That 2 last features (DB Maintenance and now the Export to CSV/XML/JSON and other formats) are the last to release the 0.3.0.

Again, Thanks a lot for using TEx and making the world a better place

guibacellar commented 9 months ago

And one thing that you say making me think about create a "Beta" build for the working versions. This way, anyone can download and try the On Development versions.

0xEnders commented 9 months ago

For 1, could it be auto translate? I am thinking of like the OCR function you implemented where you set a few languages you want and it translate from there.

Yeah I can understand how big the files can end up especially if you are monitoring a lot of groups and OCR function needs to be able to download the files to analyze it. I am figuring out a way how to automatically push my logs to my SIEM to analyze but most of the time I just stop the scraper, cut out the data_local.db file, and start the scraper again

Having a Beta build would be great for us to test out your new features and give feedback.

On a side note, I would recommend implementing an image censor/filter in the distant future. I have come across multiple photos that people might consider gory or depressing. Having such a feature could really benefit the analysts in the future.

Looking forward to your official v3.0 release!

guibacellar commented 9 months ago

It' taking a time, but I'm almost there ;)

Realtime, Rolling file exporting as csv, xml, json or pickle......

guibacellar commented 9 months ago

Closing for Issues Cleanup. All requests features was addressed in other issues

guibacellar / TEx

[Feature Request] Multiple features #56

export config file