Closed ReichYang closed 2 years ago
Hi, what version of pmaw are you using? I was working on fixing this issue in the latest release
@mattpodolak Hi, I'm using 2.00. Also, I encountered a memory error even I have mem_safe as True.
MemoryError
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-14-61805bcbf485> in <module>
2 posts = api.search_submissions(subreddit=sub, mem_safe=True, after=1585713600, before=1637207011, safe_exit=True)
3 print(f'{len(posts)} posts retrieved from Pushshift')
----> 4 post_list = [post for post in posts]
5 pd.DataFrame(post_list).to_pickle(f"{sub}_submissions.pkl")
<ipython-input-14-61805bcbf485> in <listcomp>(.0)
2 posts = api.search_submissions(subreddit=sub, mem_safe=True, after=1585713600, before=1637207011, safe_exit=True)
3 print(f'{len(posts)} posts retrieved from Pushshift')
----> 4 post_list = [post for post in posts]
5 pd.DataFrame(post_list).to_pickle(f"{sub}_submissions.pkl")
D:\Python37\lib\_collections_abc.py in __next__(self)
315 When exhausted, raise StopIteration.
316 """
--> 317 return self.send(None)
318
319 @abstractmethod
~\AppData\Roaming\Python\Python37\site-packages\pmaw\Response.py in send(self, ignored_arg)
30 Response generator object
31 """
---> 32 cache = Cache.load_with_key(key, cache_dir)
33 return Response(cache)
34
~\AppData\Roaming\Python\Python37\site-packages\pmaw\Cache.py in load_resp(self, cache_num)
56 with gzip.open(f'{self.folder}/{self.key}_info.pickle.gz', 'rb') as handle:
57 return pickle.load(handle)
---> 58 except FileNotFoundError:
59 log.info('No previous requests to load')
60 return None
MemoryError:
Can you try updating to the latest version? This will solve the issue with the index error.
Enabling memory safety means that pmaw wont trigger a memory error during retrieval as the results will be stored in a cache on disk.
The memory issue will arise if you try to iterate through every result at one time.
I would recommend iterating through the generator in batches to solve this.
Hi, there. I'm running into this error for the scraping for submissions. Could you let me know why and how can I get pass it?