mattwright324 / youtube-comment-suite

Download YouTube comments from numerous videos, playlists, and channels for archiving, general search, and showing activity.
MIT License
275 stars 46 forks source link

Lost inserts - how to deal with them? #88

Closed bernard01 closed 3 years ago

bernard01 commented 3 years ago

I have experienced lost inserts, in 1.4.6 and now in 1.4.7. The case is as follows: One hour after the last refresh, I create 2 new comments using the browser. Then I wait for perhaps a minute and run a refresh. Only the latest of the two inserts is imported. There is of course always the nagging question whether YouTube actually inserted a comment permanently. To answer this question, I created a new instance and ran an initial refresh. That found the comment. Then I use DB Browser for SQLite with my main database and the fresh database attached as "fresh" to run this query:

insert into comments select co2.* from fresh.comments co2 where not exists( select co.comment_id from comments co where co.comment_id = co2.comment_id ); Problem solved. But that is quite tedious.

It is interesting that it was not the last comment that was missing but the first. Perhaps one of the options in the refresh function should be used to solve the issue? However I do not want to lose comments from deleted videos which the fresh database does not get.

mattwright324 commented 3 years ago

I am a bit confused about what is happening. How did you make the two comments, were they both new top level comments and/or as replies in older threads? If you have the 'smart-paging' option enabled for the refresh and your missing comment was a reply in an older thread that would explain it.

bernard01 commented 3 years ago

For the refresh I use all the defaults, and that includes 'smart-paging'. And the missing comment was a reply to an older thread. What options do I need to be safe?

mattwright324 commented 3 years ago

I guess the smartness of it could be improved or changed but the way it works now is that while it is going through the API comment pages of a video, 'smart-paging' checks if all the top level thread comment ids already exist in the database and stops there to reduce API quota usage since every page following would already exist in the database too. So that you can reasonably get everything without having to actually grab everything or have to figure out what API page limit option to select in the refresh options. Replies are not accounted for in this check at the moment though if they were it could only be up to the first 5 replies on the thread.

You could occasionally disable smart-paging and see how that works out but you may not want it turned off all the time.

bernard01 commented 3 years ago

Thanks.