twintproject / twint

An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
MIT License
15.64k stars 2.72k forks source link

[QUESTION] Pandas can't read newest JSON file format by Twint #1430

Closed willnovan closed 1 year ago

willnovan commented 1 year ago

Pandas can't read JSON file

I've tried many ways to read the new JSON file format from Twint, but it couldn't and returning error below:

ValueError: Unexpected character found when decoding array value (2)

It was perfect with the old one where the datas stored in square bracket. Please, do anyone have a solution for this issue?

below is the codes and picture of data stored in json:

import twint import nest_asyncio

nest_asyncio.apply() c = twint.Config() c.Store_object = True c.Lang = "id" c.Limit = 2000 c.Since = "2022-12-30" c.Until = "2023-02-23" c.Search = "kebijakan hilirisasi -ekspor"

c.Output = "researchdataset.json" twint.run.Search(c) example

or maybe the problem was from my json that stored the datas like that?

jkcorrea commented 1 year ago

You need c.Store_json = True and read it back with pandas.read_json('researchdataset.json', lines=True)

willnovan commented 1 year ago

It's working, thank you very much!