Closed Ay1tsMe closed 1 month ago
I am unable to reproduce this. Could you run the same command but with --log-level debug
? After program exits, look for the last occurrence of the line containing Request next batch of posts from API URL "<some URL>
. Copy and paste that URL into the browser and see what it gives you.
I am unable to reproduce this. Could you run the same command but with
--log-level debug
? After program exits, look for the last occurrence of the line containingRequest next batch of posts from API URL "<some URL>
. Copy and paste that URL into the browser and see what it gives you.
Actually, I just encounter this downloading a new creator for the first time? This is clearly shown by the output at the end of the first pull
May 29 00:11:53: info: PostDownloader: Done downloading posts by 'clickspring'
May 29 00:11:53: info: PostDownloader: Total 20 / null posts processed
May 29 00:11:53: info: PostDownloader end
Total 1 targets processed
-------------------------
0: https://www.patreon.com/clickspring/posts
Total 20 / null posts processed
Methinks it is a page thing, in so much as only 20 items are listed per page and the next page function is not working.
Just a thought.
@develroo , yes, the "next page" function is not returning expected data in your case, which I could not reproduce - and also why I've provided the steps to help diagnose this. Did you say this happens only for creators you've just subscribed?
This happened to me to a patreon i had subscribed to for 3 months so I dont think its an issue with just subscribed patreons. I havnt been able to post logs because i am no longer subscribed to any patreons anymore. Hopefully @develroo can give you a hand with the logs.
To test this, I subscribed to the $1 tier from clickspring
. Here's what I got from the logs when downloading from https://www.patreon.com/clickspring/posts
(this is when I have no download issues):
So following the link in one of the lines containing "Request next batch of posts from ...", I got this in Firefox:
This is more or less what you should get, but apparently you got a different result. So I am asking for this piece of info here.
@Ay1tsMe , I don't think you need to have a subscription to test. The "next page" function should return the same fields (but with different values).
im running patreon-dl
over the patreon which i dont have premium access to anymore. I clicked the "Request next batch of posts from ..." link and the total was 287
which im assuming means the total amount of posts that needs to be downloaded. From my understanding it seems to be getting the correct number of posts, maybe its timing out from taking a long time i dunno. Im still running through the download process so if it stops before 287 ill send the logs.
"links": {
"next": "https://www.patreon.com/api/posts?include=campaign%2Caccess_rules%2Caccess_rules.tier.null%2Cattachments%2Caudio%2Caudio_preview.null%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_responses.choice%2Cpoll.current_user_responses.poll%2Cuser%2Cuser_defined_tags%2Cti_checks&sort=-published_at&json-api-version=1.0&filter%5Bcontains_exclusive_posts%5D=true&filter%5Bis_draft%5D=false&filter%5Bcampaign_id%5D=791478&page%5Bcursor%5D=02_V7PhxzYAaqB79OISTPUB9Xb"
},
"meta": {
"pagination": {
"cursors": {
"next": "02_V7PhxzYAaqB79OISTPUB9Xb"
},
"total": 287
}
}
}```
okay wasn't able to produce the error. I'm assuming it has something to do with me not downloading the premium content. I'll leave this open incase @develroo has the issue still
Total 1 targets processed
-------------------------
0: https://www.patreon.com/letstalkaboutmathrock/posts
Total 287 / 287 posts processed
@develroo , yes, the "next page" function is not returning expected data in your case, which I could not reproduce - and also why I've provided the steps to help diagnose this. Did you say this happens only for creators you've just subscribed?
Re-run with the debug on
May 29 13:22:31: info: PostDownloader: Download batch complete (#21): 4 downloads; 3 completed; 1 errors; 0 skipped; 0 aborted
May 29 13:22:31: debug: Update status cache for post #32584570
May 29 13:22:31: info: PostDownloader: Fetch more posts
May 29 13:22:31: debug: PostDownloader: Request next batch of posts from API URL "https://www.patreon.com/api/posts?include=campaign%2Caccess_rules%2Caccess_rules.tier.null%2Cattachments%2Caudio%2Caudio_preview.null%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_responses.choice%2Cpoll.current_user_responses.poll%2Cuser%2Cuser_defined_tags%2Cti_checks&sort=-published_at&json-api-version=1.0&filter%5Bcontains_exclusive_posts%5D=true&filter%5Bis_draft%5D=false&filter%5Bcampaign_id%5D=175286&page%5Bcursor%5D=02ml2QtAttmVOIe5zkDDx4b0wc
May 29 13:22:32: debug: PostParser: Parse API response of "https://www.patreon.com/api/posts?include=campaign%2Caccess_rules%2Caccess_rules.tier.null%2Cattachments%2Caudio%2Caudio_preview.null%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_responses.choice%2Cpoll.current_user_responses.poll%2Cuser%2Cuser_defined_tags%2Cti_checks&sort=-published_at&json-api-version=1.0&filter%5Bcontains_exclusive_posts%5D=true&filter%5Bis_draft%5D=false&filter%5Bcampaign_id%5D=175286"
May 29 13:22:32: warn: PostParser: 'included' field missing in API response of "https://www.patreon.com/api/posts?include=campaign%2Caccess_rules%2Caccess_rules.tier.null%2Cattachments%2Caudio%2Caudio_preview.null%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_responses.choice%2Cpoll.current_user_responses.poll%2Cuser%2Cuser_defined_tags%2Cti_checks&sort=-published_at&json-api-version=1.0&filter%5Bcontains_exclusive_posts%5D=true&filter%5Bis_draft%5D=false&filter%5Bcampaign_id%5D=175286" or has incorrect type - no media items and campaign info will be returned
May 29 13:22:32: warn: PostParser: No posts found in API response of "https://www.patreon.com/api/posts?include=campaign%2Caccess_rules%2Caccess_rules.tier.null%2Cattachments%2Caudio%2Caudio_preview.null%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_responses.choice%2Cpoll.current_user_responses.poll%2Cuser%2Cuser_defined_tags%2Cti_checks&sort=-published_at&json-api-version=1.0&filter%5Bcontains_exclusive_posts%5D=true&filter%5Bis_draft%5D=false&filter%5Bcampaign_id%5D=175286"
May 29 13:22:32: debug: PostDownloader: 0 posts fetched
May 29 13:22:32: debug: PostDownloader: No further posts to fetch
May 29 13:22:32: info: PostDownloader: Done downloading posts by 'clickspring'
May 29 13:22:32: info: PostDownloader: Total 21 / null posts processed (skipped: 19 redundant)
May 29 13:22:32: info: PostDownloader end
Total 1 targets processed
-------------------------
0: https://www.patreon.com/clickspring/posts
Total 21 / null posts processed (skipped: 19 redundant)
Edit: Oh no.. I have been subscribed ages, but it is the first time I ran patreon-dl on them. So that is why it stood out. As I have done batch downloads before, and they did not stop at 20. I know Patreon changed something because the URL responses have changed from when I last looked at them. I can prolly test with other creators I have not downloaded yet as I only found this wonderful tool last month and it has been a boon downloading the back catalog of one creator who has been producing stuff every few weeks for years now.
So just a big THANKS for that. Really appreciate the work. and happy to help any way I can.
Edit 2:
So I just looked at the post id and indeed, it does seem to be the last post before the 'Load More' has to be pushed.
maybe its timing out from taking a long time i dunno.
I too suspect this may be the cause. In my tests, I skipped downloading videos, and the 'next page' / 'load more' links do expire after some time. I'll do some tests and see how long the 'next' links last.
@Ay1tsMe , @develroo , thanks for helping out. Very useful discussion.
Can confirm, that it is repeated each time a download is restarted. Here is the next time.
May 29 16:25:47: info: PostDownloader: Download complete (#21.1): "/home/rooster/mnt/sshfs/clickspring - Clickspring/posts/21429254 - The Antikythera Mechanism Episode 8 - Making The Mean Lunar Sidereal Train/embed/youtube-OBI54xujkN0 (1080p50).mp4"
May 29 16:25:47: info: PostDownloader: Download batch complete (#21): 4 downloads; 4 completed; 0 errors; 0 skipped; 0 aborted
May 29 16:25:47: debug: Update status cache for post #21429254
May 29 16:25:47: info: PostDownloader: Fetch more posts
May 29 16:25:47: debug: PostDownloader: Request next batch of posts from API URL "https://www.patreon.com/api/posts?include=campaign%2Caccess_rules%2Caccess_rules.tier.null%2Cattachments%2Caudio%2Caudio_preview.null%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_responses.choice%2Cpoll.current_user_responses.poll%2Cuser%2Cuser_defined_tags%2Cti_checks&sort=-published_at&json-api-version=1.0&filter%5Bcontains_exclusive_posts%5D=true&filter%5Bis_draft%5D=false&filter%5Bcampaign_id%5D=175286&page%5Bcursor%5D=029as-8qYPr6mBmmuY2hhdie3o
May 29 16:25:47: debug: PostParser: Parse API response of "https://www.patreon.com/api/posts?include=campaign%2Caccess_rules%2Caccess_rules.tier.null%2Cattachments%2Caudio%2Caudio_preview.null%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_responses.choice%2Cpoll.current_user_responses.poll%2Cuser%2Cuser_defined_tags%2Cti_checks&sort=-published_at&json-api-version=1.0&filter%5Bcontains_exclusive_posts%5D=true&filter%5Bis_draft%5D=false&filter%5Bcampaign_id%5D=175286"
May 29 16:25:47: warn: PostParser: 'included' field missing in API response of "https://www.patreon.com/api/posts?include=campaign%2Caccess_rules%2Caccess_rules.tier.null%2Cattachments%2Caudio%2Caudio_preview.null%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_responses.choice%2Cpoll.current_user_responses.poll%2Cuser%2Cuser_defined_tags%2Cti_checks&sort=-published_at&json-api-version=1.0&filter%5Bcontains_exclusive_posts%5D=true&filter%5Bis_draft%5D=false&filter%5Bcampaign_id%5D=175286" or has incorrect type - no media items and campaign info will be returned
May 29 16:25:47: warn: PostParser: No posts found in API response of "https://www.patreon.com/api/posts?include=campaign%2Caccess_rules%2Caccess_rules.tier.null%2Cattachments%2Caudio%2Caudio_preview.null%2Cimages%2Cmedia%2Cnative_video_insights%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_responses.choice%2Cpoll.current_user_responses.poll%2Cuser%2Cuser_defined_tags%2Cti_checks&sort=-published_at&json-api-version=1.0&filter%5Bcontains_exclusive_posts%5D=true&filter%5Bis_draft%5D=false&filter%5Bcampaign_id%5D=175286"
May 29 16:25:47: debug: PostDownloader: 0 posts fetched
May 29 16:25:47: debug: PostDownloader: No further posts to fetch
May 29 16:25:47: info: PostDownloader: Done downloading posts by 'clickspring'
May 29 16:25:47: info: PostDownloader: Total 21 / null posts processed (skipped: 39 redundant)
May 29 16:25:47: info: PostDownloader end
Total 1 targets processed
-------------------------
0: https://www.patreon.com/clickspring/posts
Total 21 / null posts processed (skipped: 39 redundant)
OK, I've run a script that fetches the same 'next page' link every minute and find that the link expires in 30 minutes. So if a page has 20 posts each having a video that takes 2 minutes to download, resulting in total download time of ~40 mins, then that would easily cause the 'next page' link to expire by the time the downloader fetches from it.
You would have thought the Patreon website would have some logic to keep the 'next page' link alive. But no - if you leave a page idle for more than 30 minutes and then click the "Load More" button, you will see the forever-spinning icon with an 'Expired' error in the XHR result:
Imagine having scrolled down a dozen pages, gone to do something else and then coming back only to find out you have to start scrolling from the first page again...
To avoid this in patreon-dl
, I think it would be necessary to iterate through all the 'next page' links and cache the responses first, then parse them as we proceed through the pages. Or is there a better way? EDIT: giving this more thought: if we collect all posts first, will links contained in each post (like image, attachment, audio...) expire in the same way so that they can't be downloaded as we move further into the collection?
OK, I've run a script that fetches the same 'next page' link every minute and find that the link expires in 30 minutes. So if a page has 20 posts each having a video that takes 2 minutes to download, resulting in total download time of ~40 mins, then that would easily cause the 'next page' link to expire by the time the downloader fetches from it.
You would have thought the Patreon website would have some logic to keep the 'next page' link alive. But no - if you leave a page idle for more than 30 minutes and then click the "Load More" button, you will see the forever-spinning icon with an 'Expired' error in the XHR result:
Imagine having scrolled down a dozen pages, gone to do something else and then coming back only to find out you have to start scrolling from the first page again...
To avoid this in
patreon-dl
, I think it would be necessary to iterate through all the 'next page' links and cache the responses first, then parse them as we proceed through the pages. Or is there a better way? EDIT: giving this more thought: if we collect all posts first, will links contained in each post (like image, attachment, audio...) expire in the same way so that they can't be downloaded as we move further into the collection?
Hmm that does make some kind of sense. The clickspring posts are mostly videos so they could take more than 30 mins. Weird it did not affect a singer I follow did not trigger that last week but they are shorter videos so maybe it refreshes quicker.
Interesting edge case. But FWIW the detection of previous downloads work fine. So that iterations over 'next' works.
I have decided to implement a timer that refreshes the 'next' URL at intervals. Let's see how that will turn out.
Released v1.7.0 which should fix this.
Closing this for now. Re-open if problem persists.
I have to keep rerunning the program to keep downloading posts. Is there a way to just download everything from a patreon page. It downloads roughly 20 posts and then stops and says its finished but when i a rerun again it starts to download new posts. The program says there is no more posts but there definitely is
Anyway to fix this. Here is what it says when it finishes. I can rerun and it skips all the posts downloaded and starts to download more.
here is my config and launch command: