Closed johnny2678 closed 1 year ago
I should add I'm seeing this both on macOS and in docker. Here's the docker output:
root@xxxx-BEAST:/mnt/user/xxxx/get-twitter-likes# docker run -v /mnt/user/xxxx/get-twitter-likes:/get-twitter-likes twitter-likes:local python get_likes_from_json_to_csv.py -pu -o data/likes1.csv
Namespace(format='raw', input='data/like.js', output='data/likes1.csv', parse_urls=True, save_json_col=False)
Reading likes from /get-twitter-likes/data/like.js
Downloading likes detailed data from Twitter API and parsing their URLs
Traceback (most recent call last):
File "get_likes_from_json_to_csv.py", line 28, in <module>
likes = get_likes_from_json(input_file=input_file,
File "/get-twitter-likes/src/core.py", line 166, in get_likes_from_json
tweets_lists = parallel(lookup_and_parse_tweets,
File "/usr/local/lib/python3.8/site-packages/fastcore/parallel.py", line 106, in parallel
return L(r)
File "/usr/local/lib/python3.8/site-packages/fastcore/foundation.py", line 97, in __call__
return super().__call__(x, *args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/fastcore/foundation.py", line 105, in __init__
items = listify(items, *rest, use_list=use_list, match=match)
File "/usr/local/lib/python3.8/site-packages/fastcore/basics.py", line 56, in listify
elif is_iter(o): res = list(o)
File "/usr/local/lib/python3.8/site-packages/fastprogress/fastprogress.py", line 47, in __iter__
raise e
File "/usr/local/lib/python3.8/site-packages/fastprogress/fastprogress.py", line 41, in __iter__
for i,o in enumerate(self.gen):
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
yield fs.pop().result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.8/site-packages/fastcore/parallel.py", line 54, in _call
return g(item)
File "/get-twitter-likes/src/core.py", line 183, in lookup_and_parse_tweets
parsed_tweets = parse_tweets(statuses, output_format=output_format, parse_urls=parse_urls)
File "/get-twitter-likes/src/core.py", line 190, in parse_tweets
list_of_lists_of_parsed_tweets = parallel_map(parse_tweet,
File "/get-twitter-likes/src/utils.py", line 20, in parallel_map
return list(result)
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
yield fs.pop().result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/get-twitter-likes/src/core.py", line 85, in parse_tweet
title_description = get_url_title_description(url, parse_url=parse_url)
File "/get-twitter-likes/src/scraping.py", line 16, in get_url_title_description
response = session.get(url, timeout=3)
File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 555, in get
return self.request('GET', url, **kwargs)
File "/usr/local/lib/python3.8/site-packages/requests_cache/core.py", line 132, in request
response = super(CachedSession, self).request(
File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.8/site-packages/requests_cache/core.py", line 109, in send
return send_request_and_cache_response()
File "/usr/local/lib/python3.8/site-packages/requests_cache/core.py", line 99, in send_request_and_cache_response
self.cache.save_response(cache_key, response)
File "/usr/local/lib/python3.8/site-packages/requests_cache/backends/base.py", line 46, in save_response
self.responses[key] = self.reduce_response(response), datetime.utcnow()
File "/usr/local/lib/python3.8/site-packages/requests_cache/backends/storage/dbdict.py", line 165, in __setitem__
super(DbPickleDict, self).__setitem__(key,
File "/usr/local/lib/python3.8/site-packages/requests_cache/backends/storage/dbdict.py", line 129, in __setitem__
con.execute("insert or replace into `%s` (key,value) values (?,?)" %
sqlite3.OperationalError: database is locked
No idea. Seems related to the threadpool used to download the tweets. Can you try setting a lower number of parallel threads?
El vie, 14 oct 2022 a las 18:49, johnny2678 @.***>) escribió:
I should add I'm seeing this both on macOS and in docker. Here's the docker output:
@.:/mnt/user/xxxx/get-twitter-likes# docker run -v /mnt/user/xxxx/get-twitter-likes:/get-twitter-likes twitter-likes:local python get_likes_from_json_to_csv.py -pu -o data/likes1.csv Namespace(format='raw', input='data/like.js', output='data/likes1.csv', parse_urls=True, save_json_col=False) Reading likes from /get-twitter-likes/data/like.js Downloading likes detailed data from Twitter API and parsing their URLs Traceback (most recent call last): File "get_likes_from_json_to_csv.py", line 28, in
likes = get_likes_from_json(input_file=input_file, File "/get-twitter-likes/src/core.py", line 166, in get_likes_from_json tweets_lists = parallel(lookup_and_parse_tweets, File "/usr/local/lib/python3.8/site-packages/fastcore/parallel.py", line 106, in parallel return L(r) File "/usr/local/lib/python3.8/site-packages/fastcore/foundation.py", line 97, in call return super().call(x, args, kwargs) File "/usr/local/lib/python3.8/site-packages/fastcore/foundation.py", line 105, in init items = listify(items, rest, use_list=use_list, match=match) File "/usr/local/lib/python3.8/site-packages/fastcore/basics.py", line 56, in listify elif is_iter(o): res = list(o) File "/usr/local/lib/python3.8/site-packages/fastprogress/fastprogress.py", line 47, in iter raise e File "/usr/local/lib/python3.8/site-packages/fastprogress/fastprogress.py", line 41, in iter for i,o in enumerate(self.gen): File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator yield fs.pop().result() File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 444, in result return self.get_result() File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in get_result raise self._exception File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run result = self.fn(self.args, *self.kwargs) File "/usr/local/lib/python3.8/site-packages/fastcore/parallel.py", line 54, in _call return g(item) File "/get-twitter-likes/src/core.py", line 183, in lookup_and_parse_tweets parsed_tweets = parse_tweets(statuses, output_format=output_format, parse_urls=parse_urls) File "/get-twitter-likes/src/core.py", line 190, in parse_tweets list_of_lists_of_parsed_tweets = parallel_map(parse_tweet, File "/get-twitter-likes/src/utils.py", line 20, in parallel_map return list(result) File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator yield fs.pop().result() File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 444, in result return self.get_result() File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in get_result raise self._exception File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run result = self.fn(self.args, self.kwargs) File "/get-twitter-likes/src/core.py", line 85, in parse_tweet title_description = get_url_title_description(url, parse_url=parse_url) File "/get-twitter-likes/src/scraping.py", line 16, in get_url_title_description response = session.get(url, timeout=3) File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 555, in get return self.request('GET', url, kwargs) File "/usr/local/lib/python3.8/site-packages/requests_cache/core.py", line 132, in request response = super(CachedSession, self).request( File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 542, in request resp = self.send(prep, **send_kwargs) File "/usr/local/lib/python3.8/site-packages/requests_cache/core.py", line 109, in send return send_request_and_cache_response() File "/usr/local/lib/python3.8/site-packages/requests_cache/core.py", line 99, in send_request_and_cache_response self.cache.save_response(cache_key, response) File "/usr/local/lib/python3.8/site-packages/requests_cache/backends/base.py", line 46, in save_response self.responses[key] = self.reduce_response(response), datetime.utcnow() File "/usr/local/lib/python3.8/site-packages/requests_cache/backends/storage/dbdict.py", line 165, in setitem super(DbPickleDict, self).setitem(key, File "/usr/local/lib/python3.8/site-packages/requests_cache/backends/storage/dbdict.py", line 129, in setitem con.execute("insert or replace into%s
(key,value) values (?,?)" % sqlite3.OperationalError: database is locked— Reply to this email directly, view it on GitHub https://github.com/xoelop/get-twitter-likes/issues/25#issuecomment-1279241679, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACA5H4R45JZXHSF6O2JB27TWDGFITANCNFSM6AAAAAARFKIT7Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>
hmm, not sure I know how to do that. I tried limiting docker access to 1 CPU using --cpuset-cpus='0'
but still got the error.
Googling python and parallel threads made my head spin. Could you point out where/how I might trying lowering the # of parallel threads?
parallel_map function, or parallel function from fastcore
On Fri, Oct 14, 2022, 19:32 johnny2678 @.***> wrote:
hmm, not sure I know how to do that. I tried limiting docker access to 1 CPU using --cpuset-cpus='0' but still got the error.
Googling python and parallel threads made my head spin. Could you point out where/how I might trying lowering the # of parallel threads?
— Reply to this email directly, view it on GitHub https://github.com/xoelop/get-twitter-likes/issues/25#issuecomment-1279282167, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACA5H4TNUPR66VZPQLDGQI3WDGKK7ANCNFSM6AAAAAARFKIT7Q . You are receiving this because you commented.Message ID: @.***>
ok, this is beyond my skill set. I got it working but can't explain why. Leaving my fix here for anyone else that comes along.
I changed https://github.com/xoelop/get-twitter-likes/blob/b5ea7c6fdbf0eee8cfd549921f1dbf5d91143742/src/core.py#L166 from
threadpool=True,
n_workers=100,
to:
threadpool=False,
n_workers=1,
and it completed successfully. Took a while but this is a one time thing so it doesn't matter.
Thanks again!
Yep! That's what you had to change. I think setting it to a number like 10 would also have worked. Glad you got it worming :)
On Sat, Oct 15, 2022, 01:12 johnny2678 @.***> wrote:
Closed #25 https://github.com/xoelop/get-twitter-likes/issues/25 as completed.
— Reply to this email directly, view it on GitHub https://github.com/xoelop/get-twitter-likes/issues/25#event-7593983662, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACA5H4SC6SGU37KYYKIEXWTWDHSGHANCNFSM6AAAAAARFKIT7Q . You are receiving this because you commented.Message ID: @.***>
@johnny2678 @xoelop A couple notes:
requests_cache.install_cache()
here isn't thread-safe, and I'd recommend using requests_cache.CachedSession
instead. There are more details in the docs here.tweepy.API.session
attribute that it uses to send requests, so that could be replaced with a CachedSession
object.
Ok, I've got the daily stuff up and running and linked to a spreadsheet. Cool stuff!
Now moving on to adding likes from the
like.js
in the twitter backup file. It runs for about 90 seconds, creates a ~1GBcache.sqlite
file, and then errors out with the following message:any ideas?