isagalaev / ijson

Iterative JSON parser with Pythonic interface
615 stars 134 forks source link

Doing this json procedure more efficiently in ijson? #68

Closed ghost closed 6 years ago

ghost commented 6 years ago

(x-post from Stack Overflow)

I have this massive json file. and I run out of memory when trying to read it in to Python. How would I implement a similar procedure using ijson?

import pandas as pd

#There are (say) 1m objects - each is its json object - within in this file. 
with open('my_file.json') as json_file:      
    data = json_file.readlines()
    #So I take a list of these json objects
    list_of_objs = [obj for obj in data]

#But I only want about 200 of the json objects
desired_data = [obj for obj in list_of_objs if object['feature']=="desired_feature"]

Basically, the file is a list of json objects. I want a list of json objects where the objects all have a certain value for a particular key. For such json objects, I want to include every attribute.

The file itself contains a list of objects like:

    "review_id": "zdSx_SD6obEhz9VrW9uAWA",
    "user_id": "Ha3iJu77CxlrFm-vQRs_8g",
    "business_id": "tnhfDv5Il8EaGSXZGiuQGg",
    "stars": 4,
    "date": "2016-03-09",
    "text": "Great place to hang out after work: the prices are decent, and the ambience is fun. It's a bit loud, but very lively. The staff is friendly, and the food is good. They have a good selection of drinks.",
    "useful": 0,
    "funny": 0,
isagalaev commented 6 years ago