xoelop / get-twitter-likes

Keeping a searchable collection of your Twitter likes
GNU General Public License v3.0
42 stars 1 forks source link

sqlite3.OperationError: database is locked #25

Closed johnny2678 closed 1 year ago

johnny2678 commented 1 year ago

Ok, I've got the daily stuff up and running and linked to a spreadsheet. Cool stuff!

Now moving on to adding likes from the like.js in the twitter backup file. It runs for about 90 seconds, creates a ~1GB cache.sqlite file, and then errors out with the following message:

(get-twitter-likes) user1@jh-ltm2 get-twitter-likes % python get_likes_from_json_to_csv.py -pu -o data/likes1.csv
Namespace(input='data/like.js', output='data/likes1.csv', format='raw', parse_urls=True, save_json_col=False)
Reading likes from /Users/user1/Projects/get-twitter-likes/data/like.js
Downloading likes detailed data from Twitter API and parsing their URLs
Traceback (most recent call last):                                                                                               
  File "/Users/user1/Projects/get-twitter-likes/get_likes_from_json_to_csv.py", line 28, in <module>
    likes = get_likes_from_json(input_file=input_file,
  File "/Users/user1/Projects/get-twitter-likes/src/core.py", line 166, in get_likes_from_json
    tweets_lists = parallel(lookup_and_parse_tweets,
  File "/Users/user1/.local/share/virtualenvs/get-twitter-likes-b1tRvag8/lib/python3.9/site-packages/fastcore/parallel.py", line 106, in parallel
    return L(r)
  File "/Users/user1/.local/share/virtualenvs/get-twitter-likes-b1tRvag8/lib/python3.9/site-packages/fastcore/foundation.py", line 97, in __call__
    return super().__call__(x, *args, **kwargs)
  File "/Users/user1/.local/share/virtualenvs/get-twitter-likes-b1tRvag8/lib/python3.9/site-packages/fastcore/foundation.py", line 105, in __init__
    items = listify(items, *rest, use_list=use_list, match=match)
  File "/Users/user1/.local/share/virtualenvs/get-twitter-likes-b1tRvag8/lib/python3.9/site-packages/fastcore/basics.py", line 56, in listify
    elif is_iter(o): res = list(o)
  File "/Users/user1/.local/share/virtualenvs/get-twitter-likes-b1tRvag8/lib/python3.9/site-packages/fastprogress/fastprogress.py", line 47, in __iter__
    raise e
  File "/Users/user1/.local/share/virtualenvs/get-twitter-likes-b1tRvag8/lib/python3.9/site-packages/fastprogress/fastprogress.py", line 41, in __iter__
    for i,o in enumerate(self.gen):
  File "/usr/local/Cellar/python@3.9/3.9.14/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 609, in result_iterator
    yield fs.pop().result()
  File "/usr/local/Cellar/python@3.9/3.9.14/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 446, in result
    return self.__get_result()
  File "/usr/local/Cellar/python@3.9/3.9.14/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/usr/local/Cellar/python@3.9/3.9.14/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/user1/.local/share/virtualenvs/get-twitter-likes-b1tRvag8/lib/python3.9/site-packages/fastcore/parallel.py", line 54, in _call
    return g(item)
  File "/Users/user1/Projects/get-twitter-likes/src/core.py", line 183, in lookup_and_parse_tweets
    parsed_tweets = parse_tweets(statuses, output_format=output_format, parse_urls=parse_urls)
  File "/Users/user1/Projects/get-twitter-likes/src/core.py", line 190, in parse_tweets
    list_of_lists_of_parsed_tweets = parallel_map(parse_tweet,
  File "/Users/user1/Projects/get-twitter-likes/src/utils.py", line 20, in parallel_map
    return list(result)
  File "/usr/local/Cellar/python@3.9/3.9.14/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 609, in result_iterator
    yield fs.pop().result()
  File "/usr/local/Cellar/python@3.9/3.9.14/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/usr/local/Cellar/python@3.9/3.9.14/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/usr/local/Cellar/python@3.9/3.9.14/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/user1/Projects/get-twitter-likes/src/core.py", line 85, in parse_tweet
    title_description = get_url_title_description(url, parse_url=parse_url)
  File "/Users/user1/Projects/get-twitter-likes/src/scraping.py", line 16, in get_url_title_description
    response = session.get(url, timeout=3)
  File "/Users/user1/.local/share/virtualenvs/get-twitter-likes-b1tRvag8/lib/python3.9/site-packages/requests/sessions.py", line 555, in get
    return self.request('GET', url, **kwargs)
  File "/Users/user1/.local/share/virtualenvs/get-twitter-likes-b1tRvag8/lib/python3.9/site-packages/requests_cache/core.py", line 132, in request
    response = super(CachedSession, self).request(
  File "/Users/user1/.local/share/virtualenvs/get-twitter-likes-b1tRvag8/lib/python3.9/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/user1/.local/share/virtualenvs/get-twitter-likes-b1tRvag8/lib/python3.9/site-packages/requests_cache/core.py", line 109, in send
    return send_request_and_cache_response()
  File "/Users/user1/.local/share/virtualenvs/get-twitter-likes-b1tRvag8/lib/python3.9/site-packages/requests_cache/core.py", line 99, in send_request_and_cache_response
    self.cache.save_response(cache_key, response)
  File "/Users/user1/.local/share/virtualenvs/get-twitter-likes-b1tRvag8/lib/python3.9/site-packages/requests_cache/backends/base.py", line 46, in save_response
    self.responses[key] = self.reduce_response(response), datetime.utcnow()
  File "/Users/user1/.local/share/virtualenvs/get-twitter-likes-b1tRvag8/lib/python3.9/site-packages/requests_cache/backends/storage/dbdict.py", line 165, in __setitem__
    super(DbPickleDict, self).__setitem__(key,
  File "/Users/user1/.local/share/virtualenvs/get-twitter-likes-b1tRvag8/lib/python3.9/site-packages/requests_cache/backends/storage/dbdict.py", line 129, in __setitem__
    con.execute("insert or replace into `%s` (key,value) values (?,?)" %
sqlite3.OperationalError: database is locked

any ideas?

johnny2678 commented 1 year ago

I should add I'm seeing this both on macOS and in docker. Here's the docker output:

root@xxxx-BEAST:/mnt/user/xxxx/get-twitter-likes# docker run -v /mnt/user/xxxx/get-twitter-likes:/get-twitter-likes twitter-likes:local python get_likes_from_json_to_csv.py -pu -o data/likes1.csv
Namespace(format='raw', input='data/like.js', output='data/likes1.csv', parse_urls=True, save_json_col=False)
Reading likes from /get-twitter-likes/data/like.js
Downloading likes detailed data from Twitter API and parsing their URLs
Traceback (most recent call last):
  File "get_likes_from_json_to_csv.py", line 28, in <module>
    likes = get_likes_from_json(input_file=input_file,
  File "/get-twitter-likes/src/core.py", line 166, in get_likes_from_json
    tweets_lists = parallel(lookup_and_parse_tweets,
  File "/usr/local/lib/python3.8/site-packages/fastcore/parallel.py", line 106, in parallel
    return L(r)
  File "/usr/local/lib/python3.8/site-packages/fastcore/foundation.py", line 97, in __call__
    return super().__call__(x, *args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/fastcore/foundation.py", line 105, in __init__
    items = listify(items, *rest, use_list=use_list, match=match)
  File "/usr/local/lib/python3.8/site-packages/fastcore/basics.py", line 56, in listify
    elif is_iter(o): res = list(o)
  File "/usr/local/lib/python3.8/site-packages/fastprogress/fastprogress.py", line 47, in __iter__
    raise e
  File "/usr/local/lib/python3.8/site-packages/fastprogress/fastprogress.py", line 41, in __iter__
    for i,o in enumerate(self.gen):
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
    yield fs.pop().result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 444, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.8/site-packages/fastcore/parallel.py", line 54, in _call
    return g(item)
  File "/get-twitter-likes/src/core.py", line 183, in lookup_and_parse_tweets
    parsed_tweets = parse_tweets(statuses, output_format=output_format, parse_urls=parse_urls)
  File "/get-twitter-likes/src/core.py", line 190, in parse_tweets
    list_of_lists_of_parsed_tweets = parallel_map(parse_tweet,
  File "/get-twitter-likes/src/utils.py", line 20, in parallel_map
    return list(result)
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
    yield fs.pop().result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 444, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/get-twitter-likes/src/core.py", line 85, in parse_tweet
    title_description = get_url_title_description(url, parse_url=parse_url)
  File "/get-twitter-likes/src/scraping.py", line 16, in get_url_title_description
    response = session.get(url, timeout=3)
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 555, in get
    return self.request('GET', url, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/requests_cache/core.py", line 132, in request
    response = super(CachedSession, self).request(
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.8/site-packages/requests_cache/core.py", line 109, in send
    return send_request_and_cache_response()
  File "/usr/local/lib/python3.8/site-packages/requests_cache/core.py", line 99, in send_request_and_cache_response
    self.cache.save_response(cache_key, response)
  File "/usr/local/lib/python3.8/site-packages/requests_cache/backends/base.py", line 46, in save_response
    self.responses[key] = self.reduce_response(response), datetime.utcnow()
  File "/usr/local/lib/python3.8/site-packages/requests_cache/backends/storage/dbdict.py", line 165, in __setitem__
    super(DbPickleDict, self).__setitem__(key,
  File "/usr/local/lib/python3.8/site-packages/requests_cache/backends/storage/dbdict.py", line 129, in __setitem__
    con.execute("insert or replace into `%s` (key,value) values (?,?)" %
sqlite3.OperationalError: database is locked
xoelop commented 1 year ago

No idea. Seems related to the threadpool used to download the tweets. Can you try setting a lower number of parallel threads?

El vie, 14 oct 2022 a las 18:49, johnny2678 @.***>) escribió:

I should add I'm seeing this both on macOS and in docker. Here's the docker output:

@.:/mnt/user/xxxx/get-twitter-likes# docker run -v /mnt/user/xxxx/get-twitter-likes:/get-twitter-likes twitter-likes:local python get_likes_from_json_to_csv.py -pu -o data/likes1.csv Namespace(format='raw', input='data/like.js', output='data/likes1.csv', parse_urls=True, save_json_col=False) Reading likes from /get-twitter-likes/data/like.js Downloading likes detailed data from Twitter API and parsing their URLs Traceback (most recent call last): File "get_likes_from_json_to_csv.py", line 28, in likes = get_likes_from_json(input_file=input_file, File "/get-twitter-likes/src/core.py", line 166, in get_likes_from_json tweets_lists = parallel(lookup_and_parse_tweets, File "/usr/local/lib/python3.8/site-packages/fastcore/parallel.py", line 106, in parallel return L(r) File "/usr/local/lib/python3.8/site-packages/fastcore/foundation.py", line 97, in call return super().call(x, args, kwargs) File "/usr/local/lib/python3.8/site-packages/fastcore/foundation.py", line 105, in init items = listify(items, rest, use_list=use_list, match=match) File "/usr/local/lib/python3.8/site-packages/fastcore/basics.py", line 56, in listify elif is_iter(o): res = list(o) File "/usr/local/lib/python3.8/site-packages/fastprogress/fastprogress.py", line 47, in iter raise e File "/usr/local/lib/python3.8/site-packages/fastprogress/fastprogress.py", line 41, in iter for i,o in enumerate(self.gen): File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator yield fs.pop().result() File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 444, in result return self.get_result() File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in get_result raise self._exception File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run result = self.fn(self.args, *self.kwargs) File "/usr/local/lib/python3.8/site-packages/fastcore/parallel.py", line 54, in _call return g(item) File "/get-twitter-likes/src/core.py", line 183, in lookup_and_parse_tweets parsed_tweets = parse_tweets(statuses, output_format=output_format, parse_urls=parse_urls) File "/get-twitter-likes/src/core.py", line 190, in parse_tweets list_of_lists_of_parsed_tweets = parallel_map(parse_tweet, File "/get-twitter-likes/src/utils.py", line 20, in parallel_map return list(result) File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator yield fs.pop().result() File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 444, in result return self.get_result() File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in get_result raise self._exception File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run result = self.fn(self.args, self.kwargs) File "/get-twitter-likes/src/core.py", line 85, in parse_tweet title_description = get_url_title_description(url, parse_url=parse_url) File "/get-twitter-likes/src/scraping.py", line 16, in get_url_title_description response = session.get(url, timeout=3) File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 555, in get return self.request('GET', url, kwargs) File "/usr/local/lib/python3.8/site-packages/requests_cache/core.py", line 132, in request response = super(CachedSession, self).request( File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 542, in request resp = self.send(prep, **send_kwargs) File "/usr/local/lib/python3.8/site-packages/requests_cache/core.py", line 109, in send return send_request_and_cache_response() File "/usr/local/lib/python3.8/site-packages/requests_cache/core.py", line 99, in send_request_and_cache_response self.cache.save_response(cache_key, response) File "/usr/local/lib/python3.8/site-packages/requests_cache/backends/base.py", line 46, in save_response self.responses[key] = self.reduce_response(response), datetime.utcnow() File "/usr/local/lib/python3.8/site-packages/requests_cache/backends/storage/dbdict.py", line 165, in setitem super(DbPickleDict, self).setitem(key, File "/usr/local/lib/python3.8/site-packages/requests_cache/backends/storage/dbdict.py", line 129, in setitem con.execute("insert or replace into %s (key,value) values (?,?)" % sqlite3.OperationalError: database is locked

— Reply to this email directly, view it on GitHub https://github.com/xoelop/get-twitter-likes/issues/25#issuecomment-1279241679, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACA5H4R45JZXHSF6O2JB27TWDGFITANCNFSM6AAAAAARFKIT7Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>

johnny2678 commented 1 year ago

hmm, not sure I know how to do that. I tried limiting docker access to 1 CPU using --cpuset-cpus='0' but still got the error.

Googling python and parallel threads made my head spin. Could you point out where/how I might trying lowering the # of parallel threads?

xoelop commented 1 year ago

parallel_map function, or parallel function from fastcore

On Fri, Oct 14, 2022, 19:32 johnny2678 @.***> wrote:

hmm, not sure I know how to do that. I tried limiting docker access to 1 CPU using --cpuset-cpus='0' but still got the error.

Googling python and parallel threads made my head spin. Could you point out where/how I might trying lowering the # of parallel threads?

— Reply to this email directly, view it on GitHub https://github.com/xoelop/get-twitter-likes/issues/25#issuecomment-1279282167, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACA5H4TNUPR66VZPQLDGQI3WDGKK7ANCNFSM6AAAAAARFKIT7Q . You are receiving this because you commented.Message ID: @.***>

johnny2678 commented 1 year ago

ok, this is beyond my skill set. I got it working but can't explain why. Leaving my fix here for anyone else that comes along.

I changed https://github.com/xoelop/get-twitter-likes/blob/b5ea7c6fdbf0eee8cfd549921f1dbf5d91143742/src/core.py#L166 from

 threadpool=True,
 n_workers=100,

to:

 threadpool=False,
 n_workers=1,

and it completed successfully. Took a while but this is a one time thing so it doesn't matter.

Thanks again!

xoelop commented 1 year ago

Yep! That's what you had to change. I think setting it to a number like 10 would also have worked. Glad you got it worming :)

On Sat, Oct 15, 2022, 01:12 johnny2678 @.***> wrote:

Closed #25 https://github.com/xoelop/get-twitter-likes/issues/25 as completed.

— Reply to this email directly, view it on GitHub https://github.com/xoelop/get-twitter-likes/issues/25#event-7593983662, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACA5H4SC6SGU37KYYKIEXWTWDHSGHANCNFSM6AAAAAARFKIT7Q . You are receiving this because you commented.Message ID: @.***>

JWCook commented 1 year ago

@johnny2678 @xoelop A couple notes: