meeb / tubesync

Syncs YouTube channels and playlists to a locally hosted media server
GNU Affero General Public License v3.0
1.96k stars 123 forks source link

Possible parsing issue #117

Closed Code-Slave closed 3 years ago

Code-Slave commented 3 years ago

This channel is causing constant failures to idex

https://www.youtube.com/channel/UCuoMasRkMhlj1VNVAOJdw5w

ndex media from source "Channel58", attempted 6 times Error: "(1366, "Incorrect string value: '\xEF\xBC\xAC\xEF\xBC\xAF...' for column tubesync.background_task.verbose_name at row 1")"

meeb commented 3 years ago

Thanks I'll look into it. I don't think that's an issue with the channel. That error is from MySQL whining that the verbose name being set is invalid, which it isn't. Alas the schema and encoding for that is set in an upstream library (Django background task) so it might need some creative hackery to fix.

Code-Slave commented 3 years ago

semi related. There is definitely something that is making tasks hang. It says they are running but nothing happening for 4 hours. if i delete that row in background_tasks everything starts up again. It would be nice to be able to do that through the gui. reset this task or if a dl task has been running for a set period then may pause it or skip?

meeb commented 3 years ago

Yes there are a number of ongoing problems with the background tasks and scheduling reliability which didn't crop up in initial testing, like the other issue you created for #34 - most of these are handled within Django Background Tasks which is making it a little difficult to work around in this downstream application. There is a "reset tasks" button at the bottom of the Tasks tab, but that basically just deletes all scheduled tasks then re-scans every bit of media to determine if there are any outstanding tasks not scheduled yet which is a bit overkill for most of these "single task got stuck" issues. Reworking tasks along with how sources are initially catalogued are probably the two largest outstanding issues to fix before a 1.0 release.

Code-Slave commented 3 years ago

Great. If you want me to test anything specifically justlet me know.

Code-Slave commented 3 years ago

Just fyi, still getting the mysql server disappeared at least once or twice a day

meeb commented 3 years ago

Thanks, I'll see what else can be done. It may end up requiring a recommended MySQL server config tweak.

Code-Slave commented 3 years ago

Heres a consistant error

2021-05-14 15:10:58,256 [tubesync/INFO] Scheduling task to download thumbnail for: How To Build A Simple DIY TV Stand...Or Media Console...Or Whatever... ¯_(ツ)_/¯ from: https://i.ytimg.com/vi_webp/4C1Namjo0dM/maxresdefault.webp

i get 500 server errors when it tries that. https://www.youtube.com/watch?v=4C1Namjo0dM

thats the video

meeb commented 3 years ago

And the 500 error you get is specifically the "Incorrect string value" one? Or a timeout?

Code-Slave commented 3 years ago

2021-05-15 10:08:46,625 [tubesync/INFO] Scheduling task to download thumbnail for: Wooden Tabletop Christmas Trees 🎄 from: https://i.ytimg.com/vi_webp/0h7e0IV_rzw/maxresdefault.webp

another one

I get no error in the log. just a 500 error on the site

Code-Slave commented 3 years ago

it looks like when titles have the stupid emoji or whatever chars in it it kills it

Code-Slave commented 3 years ago

tESTING OUT THIS

ALTER TABLE your_database_name.your_table CONVERT TO CHARACTER SET utf8; i ran that on background tasks and completed background tasks and it looks like its working. testing it out

Code-Slave commented 3 years ago

That fixes the ndex media from source "Channel58", attempted 6 times Error: "(1366, "Incorrect string value: '\xEF\xBC\xAC\xEF\xBC\xAF...' for column tubesync.background_task.verbose_name at row 1")"

it seems

meeb commented 3 years ago

Thanks, that should indeed fix it. I think I never discovered this because I create all MySQL databases with utf8 encoding by default which is inherited (I think? I'm not really a MySQL guy) by tables when they are created. If you had created the initial database with CREATE DATABASE tubesync rather than CREATE DATABASE tubesync CHARACTER SET utf8 for example is possibly the cause of this. I'll look at tweaking the docs and there's an array of Django connection parameters related to this for MySQL that might work.

Thanks for digging into this.

Code-Slave commented 3 years ago

Is there a command line reset all tasks? cause of my channel sizes i think its timing out and workers dies

meeb commented 3 years ago

Yeap!

https://github.com/meeb/tubesync/blob/main/docs/reset-tasks.md

Code-Slave commented 3 years ago

the emojis in titles are still killing it for thumbnails names

Code-Slave commented 3 years ago

I think im getting close. See here https://stackoverflow.com/questions/20411440/incorrect-string-value-xf0-x9f-x8e-xb6-xf0-x9f-mysql

i thing straight utf8 wont work. and you may have to add charset to connect string. crating a new db to test

meeb commented 3 years ago

Yeah, try: USE tubesync; ALTER TABLE background_task.verbose_name CONVERT TO CHARACTER SET utf8mb4; or similar.

Code-Slave commented 3 years ago

its all utf8mb4. i think the charset needs to be set in the connect string too. reading that link they say it does

Code-Slave commented 3 years ago

https://stackoverflow.com/questions/15943938/django-charset-and-encoding

i think encoding needs to be passed. i recreated the db as utf8mb4 and it still failed

meeb commented 3 years ago

utf8 was already the default for MySQL connections from Django, I've changed this to utf8mb4 though in :latest - give that a try and see if it helps.

Code-Slave commented 3 years ago

Welp so far it added those tasks. db is utf8mb4 and with latest so far so good. Thanks for working on this. I was on a mission this morning

Code-Slave commented 3 years ago

reindexing 17 sources now so will see how it goes

meeb commented 3 years ago

No problem, it's pretty important MySQL works as a stable backend so thanks for testing!

Code-Slave commented 3 years ago

No issues re the emoji now. I think the docs need to remind to create the db with that encoding

Still losing connection to the db and if its in the middle of a dl then that item stays locked in background task with a pid that no longer exists. It also keeps any other tasks from starting up

meeb commented 3 years ago

OK, I've added the DB encoding note to the docs.

If you continue to get connection loss issues please open a new issue as that's a different MySQL related problem!

Thanks again for the testing.