ownaginatious / fbchat-archive-parser

An application for parsing chat history from a Facebook data archive.
MIT License
312 stars 38 forks source link

Language support #47

Closed ghost closed 7 years ago

ghost commented 7 years ago

Is there ever going to be update so it work with languages other then English :)

ownaginatious commented 7 years ago

There is already support for archives in a number of languages other than English (see here). Which language are you looking for support for?

ghost commented 7 years ago

I really need support for Serbian (Cyrillic and Latinic ). Thanks for help btw :)

ownaginatious commented 7 years ago

Does Facebook actually support Cyrillic or Latin for the Serbian language? I can't find either in the list of supported languages. Closest I can find is Croatian.

EDIT: Never mind, found that it is supported in Cyrillic. I don't see Latinic support though :/

ghost commented 7 years ago

Yup. Latinic is not listed. We use Cyrilic and Latinic both in our language but when you want to use serbian language somewhere on internet you only have Cyrilic as supported language (in 99% cases), so yes that was my bad sorry . :)

ownaginatious commented 7 years ago

Oh okay. This parser itself actually supports all languages in terms of actual message text, and support just needs to be added to parse the time stamps, which Facebook translates into whatever your locale is.

I have added support for Croatian (I'm assuming at least some Serbians who want the Latin standard use this), but I cannot download any samples with Cyrillic due to Facebook only letting you download your data approximately once per day.

If you already have your data downloaded for Serbian (Cyrillic), would you mind running the following and post the output?

$ fbcap messages.htm

It should crash and give a sample Serbian time stamp I can work with to add proper support.

ghost commented 7 years ago
Discovered chat thread with [...........................(i hope this doesnt metter :) )]...
Unexpected time format in "четвртак 24. новембар 2016. у 20:11 UTC+01". If you downloaded your Facebook data in a language other than English, then it's possible support may need to be added to this tool.

Please report this as a bug on the associated GitHub page and it will be fixed promptly.

There you go .

ownaginatious commented 7 years ago

Okay, thanks for providing that. I think I have fixed it. Please try the new version I just posted: 0.9.post29

ghost commented 7 years ago

Now I get this.. :/

Discovered chat thread with [.........................................................................]...Traceback (most recent call last):
  File "c:\program files\python36\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\program files\python36\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Program Files\Python36\Scripts\fbcap.exe\__main__.py", line 9, in <module>
  File "c:\program files\python36\lib\site-packages\fbchat_archive_parser\main.py", line 188, in main
    app.run()
  File "c:\program files\python36\lib\site-packages\clip.py", line 652, in run
    self.invoke(self.parse(tokens))
  File "c:\program files\python36\lib\site-packages\clip.py", line 634, in invoke
    self._main.invoke(parsed)
  File "c:\program files\python36\lib\site-packages\clip.py", line 519, in invoke
    self._callback(**{k: v for k, v in iteritems(parsed) if k not in self._subcommands})
  File "c:\program files\python36\lib\site-packages\fbchat_archive_parser\main.py", line 125, in fbcap
    fbch = parser.parse()
  File "c:\program files\python36\lib\site-packages\fbchat_archive_parser\parser.py", line 107, in parse
    self._parse_content()
  File "c:\program files\python36\lib\site-packages\fbchat_archive_parser\parser.py", line 133, in _parse_content
    self._process_element(pos, element)
  File "c:\program files\python36\lib\site-packages\fbchat_archive_parser\parser.py", line 262, in _process_element
    parse_timestamp(e.text, self.use_utc, self.timezone_hints)
  File "c:\program files\python36\lib\site-packages\fbchat_archive_parser\time.py", line 226, in parse_timestamp
    timestamp = date_parser.parse(timestamp_string)
  File "c:\program files\python36\lib\site-packages\fbchat_archive_parser\time.py", line 93, in parse
    return self._parse_fallback(timestamp)
  File "c:\program files\python36\lib\site-packages\fbchat_archive_parser\time.py", line 87, in _parse_fallback
    locale=self.locale_id).datetime
  File "c:\program files\python36\lib\site-packages\arrow\api.py", line 23, in get
    return _factory.get(*args, **kwargs)
  File "c:\program files\python36\lib\site-packages\arrow\factory.py", line 198, in get
    dt = parser.DateTimeParser(locale).parse(args[0], args[1])
  File "c:\program files\python36\lib\site-packages\arrow\parser.py", line 55, in __init__
    self.locale = locales.get_locale(locale)
  File "c:\program files\python36\lib\site-packages\arrow\locales.py", line 20, in get_locale
    raise ValueError('Unsupported locale \'{0}\''.format(name))
ValueError: Unsupported locale 'hr_hr'
ownaginatious commented 7 years ago

Ah, had a bug. I fixed it in 0.9.post31, but it will crash again because the parser is likely not recognizing the name of a month. This is due to Facebook frequently using the wrong inflection in dates in Slavic languages. I can fix this, however, I require another sample. Facebook doesn't want to let me re-download my own data in Serbian.

Please try running again and report the error it prints out.

ghost commented 7 years ago

Sorry but i am kinda new in github. How do i get that version? Should i just redownload and install it again or?

ownaginatious commented 7 years ago

Oh, actually, I think everything should work fine now.

Not sure how you installed, but if you used pip you can get the latest version by doing pip install fbchat-archive-parser --upgrade.