redouane59 / twittered

Twitter API client for Java developers
Apache License 2.0
236 stars 64 forks source link

Chunked tweet JSON object during filtered tweets stream causes JsonParseException #347

Closed oleg-grigorijan closed 2 years ago

oleg-grigorijan commented 2 years ago

Sometimes only a chunk of a tweet JSON object is received during tweets streaming. It is unexpected by the library and causes JsonParseException and connection closing. This problem occurs very often.

Real example:

Previous streamed message (valid finished JSON):

{"data":{"attachments":{},"author_id":"1131854274223366144","context_annotations":[{"domain":{"id":"65","name":"Interests and Hobbies Vertical","description":"Top level interests and hobbies groupings, like Food or Travel"},"entity":{"id":"848920371311001600","name":"Technology","description":"Technology and computing"}},{"domain":{"id":"30","name":"Entities [Entity Service]","description":"Entity Service top level domain, every item that is in Entity Service should be in this domain"},"entity":{"id":"898650876658634752","name":"Cybersecurity","description":"Cybersecurity"}},{"domain":{"id":"30","name":"Entities [Entity Service]","description":"Entity Service top level domain, every item that is in Entity Service should be in this domain"},"entity":{"id":"1047123725525479425","name":"Information security","description":"Information Security"}}],"conversation_id":"1475756655954604032","created_at":"2021-12-28T09:12:59.000Z","entities":{"hashtags":[{"start":93,"end":107,"tag":"cybersecurity"}],"mentions":[{"start":3,"end":14,"username":"MarkoGulan","id":"1466728993663729667"}]},"geo":{},"id":"1475756655954604032","lang":"en","possibly_sensitive":false,"public_metrics":{"retweet_count":1,"reply_count":0,"like_count":0,"quote_count":0},"referenced_tweets":[{"type":"retweeted","id":"1475363539124637697"}],"reply_settings":"everyone","source":"Cyber Security Feed","text":"RT @MarkoGulan: Are you preventing cyber attackers from hacking into your data? Discover the #cybersecurity experts and solutions available…"},"includes":{"users":[{"created_at":"2019-05-24T09:27:42.000Z","description":"check out a privacy oriented social media platform 👉🏾https://t.co/KkqBnPkRZK Cyber Security News in 1 place! Retweets original Cyber Sec tweets. 🤖 made by @AbdirahiimYa","entities":{"description":{"urls":[{"start":53,"end":76,"url":"https://t.co/KkqBnPkRZK","expanded_url":"http://samochat.net","display_url":"samochat.net"}],"mentions":[{"start":155,"end":168,"username":"AbdirahiimYa"}]}},"id":"1131854274223366144","location":"Internet","name":"Cyber Security Feed","pinned_tweet_id":"1309499106067460096","profile_image_url":"https://pbs.twimg.com/profile_images/1131855016766124032/vhasETOF_normal.jpg","protected":false,"public_metrics":{"followers_count":19818,"following_count":1,"tweet_count":2025303,"listed_count":356},"url":"","username":"cybersec_feeds","verified":false},{"created_at":"2021-12-03T11:20:34.000Z","description":"Industrial Cyber Security Consultant | 🇭🇷🇧🇬🇸🇰🇷🇸🇬🇷🇷🇴experienced #consultant in #cybersecurity | 12 years experienced in #cloud & #datacenter","entities":{"url":{"urls":[{"start":0,"end":23,"url":"https://t.co/DL2YwkO1Zr","expanded_url":"https://www.linkedin.com/in/markogulan/","display_url":"linkedin.com/in/markogulan/"}]},"description":{"hashtags":[{"start":63,"end":74,"tag":"consultant"},{"start":78,"end":92,"tag":"cybersecurity"},{"start":119,"end":125,"tag":"cloud"},{"start":128,"end":139,"tag":"datacenter"}]}},"id":"1466728993663729667","location":"Republic of Croatia","name":"Marko Gulan","profile_image_url":"https://pbs.twimg.com/profile_images/1467530526118584322/AjdFHWc8_normal.jpg","protected":false,"public_metrics":{"followers_count":55,"following_count":478,"tweet_count":114,"listed_count":0},"url":"https://t.co/DL2YwkO1Zr","username":"MarkoGulan","verified":false}],"tweets":[{"attachments":{"media_keys":["3_1475363537266647040"]},"author_id":"1466728993663729667","context_annotations":[{"domain":{"id":"65","name":"Interests and Hobbies Vertical","description":"Top level interests and hobbies groupings, like Food or Travel"},"entity":{"id":"848920371311001600","name":"Technology","description":"Technology and computing"}},{"domain":{"id":"30","name":"Entities [Entity Service]","description":"Entity Service top level domain, every item that is in Entity Service should be in this domain"},"entity":{"id":"898650876658634752","name":"Cybersecurity","description":"Cybersecurity"}},{"domain":{"id":"30","name":"Entities [Entity Service]","description":"Entity Service top level domain, every item that is in Entity Service should be in this domain"},"entity":{"id":"1047123725525479425","name":"Information security","description":"Information Security"}}],"conversation_id":"1475363539124637697","created_at":"2021-12-27T07:10:53.000Z","entities":{"annotations":[{"start":127,"end":153,"probability":0.7303,"type":"Organization","normalized_text":"Schneider Electric Exchange"}],"hashtags":[{"start":77,"end":91,"tag":"cybersecurity"},{"start":231,"end":242,"tag":"ExchangeSE"}],"urls":[{"start":207,"end":230,"url":"https://t.co/KpS5fDYXwE","expanded_url":"http://spr.ly/6019JFxbB","display_url":"spr.ly/6019JFxbB","status":200,"title":"Create. Collaborate. Scale | Schneider Electric Exchange","description":"Join Schneider Electric Exchange, a community of cross-industry experts who tackle challenges, exchange ideas, and innovate the bold ideas of tomorrow, today.","unwound_url":"https://exchange.se.com/shop/products-services?problemToSolve=Communication%20Networks%20%26%20Cybersecurity"},{"start":243,"end":266,"url":"https://t.co/0MRHDZNG8X","expanded_url":"https://twitter.com/MarkoGulan/status/1475363539124637697/photo/1","display_url":"pic.twitter.com/0MRHDZNG8X"}]},"geo":{},"id":"1475363539124637697","lang":"en","possibly_sensitive":false,"public_metrics":{"retweet_count":1,"reply_count":0,"like_count":1,"quote_count":0},"reply_settings":"everyone","source":"Powered by Sprinklr","text":"Are you preventing cyber attackers from hacking into your data? Discover the #cybersecurity experts and solutions available on Schneider Electric Exchange to help you protect your company from cyberthreats: https://t.co/KpS5fDYXwE\n#ExchangeSE https://t.co/0MRHDZNG8X"}]},"matching_rules":[{"id":"1474297964919238656","tag":"0ec1edeb-d6a7-4fda-9eb9-a996c66227cb"}]}

Next streamed message:

_count":52},"url":"","username":"SHEsus__Christ","verified":false}],"tweets":[{"attachments":{},"author_id":"157516153","conversation_id":"1475502434894106636","created_at":"2021-12-27T16:22:48.000Z","entities":{"annotations":[{"start":39,"end":50,"probability":0.3626,"type":"Person","normalized_text":"Evangelicals"}]},"geo":{},"id":"1475502434894106636","lang":"en","possibly_sensitive":false,"public_metrics":{"retweet_count":416,"reply_count":207,"like_count":3886,"quote_count":41},"reply_settings":"everyone","source":"Twitter for iPhone","text":"The promise of eternity in heaven with Evangelicals is not the bribe they think it is."}]},"matching_rules":[{"id":"1474297921449480198","tag":"9d10afb4-db2c-42b7-8136-118f1eb88c3e"}]}

Causes:

com.fasterxml.jackson.core.JsonParseException: Unrecognized token '_count': was expecting ('true', 'false' or 'null')
 at [Source: (String)"_count":52},"url":"","username":"SHEsus__Christ","verified":false}],"tweets":[{"attachments":{},"author_id":"157516153","conversation_id":"1475502434894106636","created_at":"2021-12-27T16:22:48.000Z","entities":{"annotations":[{"start":39,"end":50,"probability":0.3626,"type":"Person","normalized_text":"Evangelicals"}]},"geo":{},"id":"1475502434894106636","lang":"en","possibly_sensitive":false,"public_metrics":{"retweet_count":416,"reply_count":207,"like_count":3886,"quote_count":41},"reply_setti"[truncated 238 chars]; line: 1, column: 7]
    at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804)
    at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:703)
    at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._reportInvalidToken(ReaderBasedJsonParser.java:2853)
    at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleOddValue(ReaderBasedJsonParser.java:1899)
    at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:757)
    at com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:4142)
    at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4001)
    at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3005)
    at io.github.redouane59.twitter.helpers.TweetStreamConsumer.handleData(TweetStreamConsumer.java:103)
    at io.github.redouane59.twitter.helpers.TweetStreamConsumer.readSocket(TweetStreamConsumer.java:87)
    at io.github.redouane59.twitter.helpers.TweetStreamConsumer.lambda$consumeStream$0(TweetStreamConsumer.java:64)
    at java.base/java.lang.Thread.run(Thread.java:829)
redouane59 commented 2 years ago

Hey @oleg-grigorijan , hmmm good point, I never saw it before. That's strange because I'm running a bot streaming some keywords 24/7. Can you give me the exact method you're using ? In all cases, I think that the method actually takes the message as it is and parse it, so if the Json is not valide, it will give an exception. I should try to see how to change its behaviour.

oleg-grigorijan commented 2 years ago

@redouane59 I use TwitterClient#startFilteredStream(IAPIEventListener) (with configured filtering rules by TwitterClient#addFilteredStreamRule(String, String)).

In all cases, I think that the method actually takes the message as it is and parse it, so if the Json is not valide, it will give an exception. I should try to see how to change its behaviour.

Yes, a badly formatted message may be an issue on the Twitter side. But maybe the stream connection should not be closed after a single separate message processing exception. For this case IAPIEventListener#onUnknownDataStreamed(String) can be invoked instead.

redouane59 commented 2 years ago

Yes, a badly formatted message may be an issue on the Twitter side. But maybe the stream connection should not be closed after a single separate message processing exception. For this case IAPIEventListener#onUnknownDataStreamed(String) can be invoked instead.

Yes of course, going to check that.

redouane59 commented 2 years ago

I think that https://github.com/redouane59/twittered/pull/348/files should solve the issue

redouane59 commented 2 years ago

Could you build the library on the develop branch and told me if it solves the problem ?

oleg-grigorijan commented 2 years ago

Yes it does. Thanks a lot

redouane59 commented 2 years ago

Great, the fix will be included on the next release :)