Open Code-Slave opened 3 years ago
to add. It did the first pass. then never again. half the sources have stuff to grab but there are no tasks for any channel except one
Do you have "scheduled" tasks on your Tasks page? Is there anything of note in your container logs? Can you provide the URL to a channel or playlist that isn't being indexed and also upload a screenshot of your source settings page for the same source.
only the one scheduled task after intial run. Its like they ran to do the initial load, then never created the refresh task every 24hrs https://www.youtube.com/c/halfasinteresting/
table background_task
Odd, does the "reset tasks" button fix it? I did encounter some weird race conditions with task management through Django signals when I was building it which is why there's a "reset tasks" button.
There we go. reset tasks and db lock errors. Likely why nothings being added. this is running in docker just fyi. The db locks while adding tasks after a reset it looks like, Only the first source gets added. If i add a new source its fine and adds the task. if i hit reset it only readds the first source and errors due to dblock. So there is somewhere either during update source or resetting thats encountering a lock. (i had not reset tasks before but i did update sources)
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python3.7/dist-packages/django/db/backends/sqlite3/base.py", line 413, in execute
return Database.Cursor.execute(self, query, params)
sqlite3.OperationalError: database is locked
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/background_task/tasks.py", line 43, in bg_runner
func(*args, **kwargs)
File "/app/sync/tasks.py", line 159, in index_source_task
source.save()
File "/usr/local/lib/python3.7/dist-packages/django/db/models/base.py", line 754, in save
force_update=force_update, update_fields=update_fields)
File "/usr/local/lib/python3.7/dist-packages/django/db/models/base.py", line 803, in save_base
update_fields=update_fields, raw=raw, using=using,
File "/usr/local/lib/python3.7/dist-packages/django/dispatch/dispatcher.py", line 179, in send
for receiver in self._live_receivers(sender)
File "/usr/local/lib/python3.7/dist-packages/django/dispatch/dispatcher.py", line 179, in <listcomp>
for receiver in self._live_receivers(sender)
File "/app/sync/signals.py", line 63, in source_post_save
media.save()
File "/usr/local/lib/python3.7/dist-packages/django/db/models/base.py", line 754, in save
force_update=force_update, update_fields=update_fields)
File "/usr/local/lib/python3.7/dist-packages/django/db/models/base.py", line 792, in save_base
force_update, using, update_fields,
File "/usr/local/lib/python3.7/dist-packages/django/db/models/base.py", line 873, in _save_table
forced_update)
File "/usr/local/lib/python3.7/dist-packages/django/db/models/base.py", line 926, in _do_update
return filtered._update(values) > 0
File "/usr/local/lib/python3.7/dist-packages/django/db/models/query.py", line 803, in _update
return query.get_compiler(self.db).execute_sql(CURSOR)
File "/usr/local/lib/python3.7/dist-packages/django/db/models/sql/compiler.py", line 1522, in execute_sql
cursor = super().execute_sql(result_type)
File "/usr/local/lib/python3.7/dist-packages/django/db/models/sql/compiler.py", line 1156, in execute_sql
cursor.execute(sql, params)
File "/usr/local/lib/python3.7/dist-packages/django/db/backends/utils.py", line 66, in execute
return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
File "/usr/local/lib/python3.7/dist-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
return executor(sql, params, many, context)
File "/usr/local/lib/python3.7/dist-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python3.7/dist-packages/django/db/utils.py", line 90, in __exit__
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/usr/local/lib/python3.7/dist-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python3.7/dist-packages/django/db/backends/sqlite3/base.py", line 413, in execute
return Database.Cursor.execute(self, query, params)
django.db.utils.OperationalError: database is locked
Rescheduling task Index media from source "731woodworks" for 0:00:06 later at 2021-01-12 13:48:04.688548+00:00
[2021-01-12 08:48:06 -0500] [437] [CRITICAL] WORKER TIMEOUT (pid:460)
Ah, is your SQLite database / /config
volume for TubeSync on an NFS share or something similar?
nope just a normal docker mount. i have 25-30 containers running that use sqlite with no issues
Odd, I've not encountered this myself. The gunicorn workers that host the front end and the background task workers do all run in the shared container and access the same sqlite db, but it really shouldn't be enough load to cause locking issues. Do you have anything else accessing the sqlite database? Sqlite viewer tool or anything? Did you tweak any of the advanced experimental options like GUNICORN_WORKERS
or TUBESYNC_WORKERS
env vars? There's nothing else fancy in TubeSync that would make it different from any other container that uses sqlite. You could try making sure nothing has the sqlite database open at all, setting TUBESYNC_WORKERS
and GUNICORN_WORKERS
both to 1
, restarting the container and then try the "reset tasks" button again. If not, check fuser /path/to/db.sqlite
or similar commands on the host as something must be locking it somewhere. If you have an SQLite database viewer connected to it, that will be the cause of the locking.
nope i left the dockerfile pretty much standard. Ill reset the container and change those to 1 and see how that works. the viewer is using a copy of the db just for that reason (Im a dba by trade)
Ah, useful for debugging. Well, it's really just standard Django with nothing fancy and no long term open transactions or anything weird. All queries are atomic by default and unless a write takes an absurdly long amount of time or some process somewhere is locking the db I can't see how it would get locked to the state it can't handle single threaded writes from one worker. Typically, almost all the writes come from a single background worker process. The front end doesn't really do writes other than a few of the admin-ish buttons like "reset tasks" so it's really not designed in a way that can get easily contended on the db. Let me know if the above fixes it for you or you find out any additional info.
its pulling from a new source right now and got the same error when it went to reschedule a specific media item. i havent reset it yet as its in the middle of a pull for a new source
I've absolutely no idea how a single worker process issuing writes locks the database if nothing else at all is accessing the db. I'll put it on my todo list to check the code and make sure it's properly transactional everywhere though.
You could also always try the old echo ".dump" | sqlite existing.db | sqlite rebuilt.db
trick to see if it was a corruption issue with db itself as well of course. On a copy of the db and switch it out (with the container stopped) as a test.
ive been able to trigger it while a task is running and trying to update that source at same time. not always but enough times to be obvious. because of all the goofing around ive now reset it all, imported my source table from the old one and reset jobs. working so far but im just updating sources or anything either. I think what started it maybe was I added a bunch of sources then while the jobs were running i was editing setting cause i goofed up on naming and thats when the jobs started not showing up
update. It seems if an indexing job is running (right now thumbnails) something is triggering a reschedule for another indexing job on a different source. that reschedule fails with a dblock
` 2021-01-12 10:31:34,745 [tubesync/INFO] Indexed media: 731woodworks / DCyBplfL2dU
2021-01-12 10:31:35,232 [tubesync/INFO] Scheduling task to download thumbnail for: What Tools Do You Need For Woodworking? from: https://i.ytimg.com/vi_webp/D7syYARDiug/maxresdefault.webp?v=5f3c9a6c
2021-01-12 10:31:35,789 [tubesync/INFO] Indexed media: 731woodworks / D7syYARDiug
2021-01-12 10:31:35,954 [tubesync/INFO] Scheduling task to download thumbnail for: Top 5 Woodworking Projects That Sell from: https://i.ytimg.com/vi/7pIKH-BCLoA/maxresdefault.jpg
2021-01-12 10:31:36,582 [tubesync/INFO] Indexed media: 731woodworks / 7pIKH-BCLoA
Rescheduling Index media from source "anawhitediy"
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python3.7/dist-packages/django/db/backends/sqlite3/base.py", line 413, in execute
return Database.Cursor.execute(self, query, params)
sqlite3.OperationalError: database is locked
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/background_task/tasks.py", line 43, in bg_runner
func(*args, **kwargs)
File "/app/sync/tasks.py", line 207, in index_source_task
media.save()
File "/usr/local/lib/python3.7/dist-packages/django/db/models/base.py", line 754, in save
force_update=force_update, update_fields=update_fields)
File "/usr/local/lib/python3.7/dist-packages/django/db/models/base.py", line 792, in save_base
force_update, using, update_fields,
File "/usr/local/lib/python3.7/dist-packages/django/db/models/base.py", line 895, in _save_table
results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw)
File "/usr/local/lib/python3.7/dist-packages/django/db/models/base.py", line 935, in _do_insert
using=using, raw=raw,
File "/usr/local/lib/python3.7/dist-packages/django/db/models/manager.py", line 85, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/django/db/models/query.py", line 1254, in _insert
return query.get_compiler(using=using).execute_sql(returning_fields)
File "/usr/local/lib/python3.7/dist-packages/django/db/models/sql/compiler.py", line 1397, in execute_sql
cursor.execute(sql, params)
File "/usr/local/lib/python3.7/dist-packages/django/db/backends/utils.py", line 66, in execute
return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
File "/usr/local/lib/python3.7/dist-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
return executor(sql, params, many, context)
File "/usr/local/lib/python3.7/dist-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python3.7/dist-packages/django/db/utils.py", line 90, in exit
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/usr/local/lib/python3.7/dist-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python3.7/dist-packages/django/db/backends/sqlite3/base.py", line 413, in execute
return Database.Cursor.execute(self, query, params)
django.db.utils.OperationalError: database is locked
2021-01-12 10:31:37,002 [tubesync/INFO] Scheduling task to download thumbnail for: Woodworking Projects for Beginners from: https://i.ytimg.com/vi_webp/6cpktYBUPv8/maxresdefault.webp
2021-01-12 10:31:37,372 [tubesync/INFO] Indexed media: 731woodworks / 6cpktYBUPv8
2021-01-12 10:31:37,772 [tubesync/INFO] Scheduling task to download thumbnail for: DIY Floating Nightstand With Storage from: https://i.ytimg.com/vi_webp/3k9CZ059gAo/maxresdefault.webp
2021-01-12 10:31:38,147 [tubesync/INFO] Indexed media: 731woodworks / 3k9CZ059gAo
2021-01-12 10:31:38,583 [tubesync/INFO] Scheduling task to download thumbnail for: Make Your Workbench Mobile from: https://i.ytimg.com/vi_webp/uNmDuevPlR8/maxresdefault.webp`
Might be something damaged with your SQLite db (e.g. hard kill on a process which was writing to the db at the time), try a "repair":
db.sqlite3
file somewhereecho ".dump" | sqlite db.sqlite3 | sqlite db-fixed.sqlite3
on the copy (this command might be sqlite3
not sqlite
depending on your setup)db-fixed.sqlite3
is valid (opens in the SQLite CLI or SQLite viewer etc.)db.sqlite3
file and make sure to delete any hidden .*-journal
files that might be hanging aboutdb-fixed.sqlite3
file back to where the original db.sqlite3
file wasdb.sqlite3
are correctIf that doesn't work, I'll have to put the ticket on my backlog to experiment with later at some point as I'll need to see if I can replicate your issue to have any hope of finding a possible cause.
this is from a brand new db. I will also do above. but every time ive done checks db has been fine. I spent time in the code today and yea its pretty standard django. after it catches up im going to dump sources and the tasks so in case it happends again. You might want to offer a clear and reschedule tasks on a souce basis instead of global
The next release will have two split background workers, one that indexes media and one that downloads media, rather than N pools of workers doing generic tasks. This could probably help with locking issues on busy writes. I'll also spend some time wrapping a bunch of the heavier events in explicit transactions rather than leaving it up to Django magic internals.
I think that will help. To me it looks like things are just stepping on each other at certain times when its busy
Im wondering if this is a quantity issue. I have 2800 media items and every reset task dies after inserting the first refresh media task where it downloads the pages. i can send you my db if you like. its 128mb but should zip well
The SQLite db should be able to handle millions of entries on paper. The issue with locking multiple writes at once. Assuming you have TUBESYNC_WORKERS
env set to 1
this really shouldn't be possible as it'll be one single non-threaded worker writing to it. Either way, it should get better when I next get a chance to go through these issues for the next release. Probably OK without the DB file for now, thanks for the offer and we can look at that if issues persist after the next release. Cheers for the feedback so far.
np. both the env vars are set to 1 right now. still happening will wait for next release as it keeps killing my tasks
Have you tried :v0.9
or :latest
? Any improvements with your issues?
Running latest Sometimes better sometimes the same. Basically it still gets to a point where i have to reset tasks cause the locks and then it can never add all the new tasks again from the locks. Im not using network shares or anything. just docker on synology nas. I do want to test on a straight ubuntu server soon to see if it something particular with synology
I've not got a NAS handy to test it on, however I would suspect the advice for you would be use Postgres once support is added.
Thats my plan as I hoard lots of channels.
Thanks for this thread. How do I add postgres to Tubesync? I'm a noob when it comes to Postgres, and have just installed it (both are running in containers on the same network, but how do I get them to "talk".)
I added about 20 sources and set them to 24hrs refresh. Out of the sources only one source ever gets updated. i have no indexing jobs for any of the other sources. any ideas?