mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.68k stars 953 forks source link

[request] Boosty.to #2387

Closed Butterfly-Dragon closed 1 week ago

Butterfly-Dragon commented 2 years ago

https://boosty.to/app/settings/subscriptions

this lists all authors one follows on that platform.

astraetech commented 1 year ago

Would love to see something for that platform too. Thank you!

XCanG commented 1 year ago

Is there a way how can I help with implementation extractor for this site? I have subscription, so I could test it out. However I'm not sure how exactly extractors work and what need for them to work.

wyatt8740 commented 1 year ago

Is there a way how can I help with implementation extractor for this site? I have subscription, so I could test it out. However I'm not sure how exactly extractors work and what need for them to work.

You have to know how to imitate requests from a web browser so that the site doesn't stonewall you, and also how to parse HTML (and possibly JSON) content to grab the important bits from the returned pages. That's the gist of it.

biggestsonicfan commented 1 month ago

Since the boostydownloader project has been confirmed as being abandoned, I feel like boosty should get another look a year after the last comment on this issue. I attempted to start my own boosty extractor code which was then cleaned up and fixed by mikf.

boosty requires cloudflare login for verification I believe, so I'm not sure how to get oAuth outside the provided cookie, but suggestions will be more than welcome.

biggestsonicfan commented 3 weeks ago

Interestingly, boostydownloader has removed it's MIT license. In the issue, they say they have made it permissible, but without a license, this is not possible. Since no code has actually changed, however, the code up to that point has remained the same.

I am finding issues with boostydownloader I wasn't aware of before, namely it does not download attachments, which in my eyes is critical. I will see if I can continue work on a boosty fork of gallery-dl for a potential PR, but I can't pretend I understand the order of operations in which gallery-dl consumes pages.

biggestsonicfan commented 3 weeks ago

Continuing the boostydownloader drama, the github user has nuked their account and all repos they had. It's a shame they had to take such catastrophic action to their code, but this means there is no viable boosty downloader at this point. I will see if I can accelerate my PR.

XCanG commented 3 weeks ago

I would say that making Boosty downloader is relatively easy, the problem is only making gallery-dl extractor.

The Boosty API is looks like this:

Getting User

https://api.boosty.to/v1/blog/________ - underscore is substitute for user

The response look like this:

{
    "blogUrl": "URL",
    "coverUrl": "https://images.boosty.to/blog/00000000/cover?change_time=1234",
    "isSubscribed": false,
    "flags": {
        "allowGoogleIndex": false,
        "acceptDonationMessages": true,
        "showPostDonations": true,
        "hasTargets": true,
        "isRssFeedEnabled": false,
        "isPaymentAcceptBlocked": false,
        "allowIndex": true,
        "isVerifyPayoutBlocked": false,
        "hasSubscriptionLevels": true,
        "isPayoutBlocked": false
    },
    "subscriptionKind": "none",
    "isReadOnly": false,
    "accessRights": {
        "canView": false,
        "canCreateComments": false,
        "canDeleteComments": false,
        "canEdit": false,
        "canCreate": false,
        "canSetPayout": false
    },
    "owner": {
        "hasAvatar": true,
        "name": "Name",
        "id": 00000000,
        "avatarUrl": "https://images.boosty.to/user/00000000/avatar?change_time=1234"
    },
    "isOwner": false,
    "subscription": null,
    "title": "Title",
    "hasAdultContent": true,
    "isBlackListed": false,
    "isBlackListedByUser": false,
    "signedQuery": "",
    "count": {
        "posts": 123,
        "subscribers": 123
    },
    "description": [{
        "type": "text",
        "modificator": "",
        "content": "[\"Hello! Welcome to my Boosty! \",\"unstyled\",[]]"
    }, {
        "content": "",
        "modificator": "BLOCK_END",
        "type": "text"
    }],
    "publicWebSocketChannel": "blogger:00000000",
    "allowedPromoTypes": [
        "discount",
        "trial",
        "trial_link"
    ],
    "isTotalBaned": false,
    "socialLinks": [{
        "type": "website",
        "url": "https://..."
    }]
}

Getting posts

User feed:

https://api.boosty.to/v1/blog/________/post/?limit=5&offset=1713295954%3A5730180&comments_limit=2&reply_limit=1&is_only_allowed=false

General feed (all users):

https://api.boosty.to/v1/feed/post/?limit=10&comments_limit=2&only_allowed=false&only_bought=false

Post example:

{
    "extra": {
        "isLast": false,
        "offset": "1700536019:4692238"
    },
    "data": [{
        "tags": [{
            "id": 1234,
            "title": "Tag name"
        }],
        "isWaitingVideo": false,
        "updatedAt": 1714404788,
        "data": [{
            "type": "image",
            "rendition": "",
            "height": 602,
            "width": 956,
            "url": "https://images.boosty.to/image/00000000-0000-0000-0000-000000000000?change_time=1234",
            "id": "00000000-0000-0000-0000-000000000000"
        },
        {
            "isMigrated": false,
            "type": "file",
            "id": "00000000-0000-0000-0000-000000000000",
            "url": "https://cdn.boosty.to/file/00000000-0000-0000-0000-000000000000",
            "size": 124157865,
            "complete": true,
            "title": "1.zip"
        }],
        "donations": 0,
        "createdAt": 1714404779,
        "teaser": [{
            "width": 575,
            "rendition": "teaser_auto_background",
            "url": "https://images.boosty.to/image/00000000-0000-0000-0000-000000000000",
            "height": 840,
            "id": "00000000-0000-0000-0000-000000000000",
            "type": "image"
        }],
        "advertiserInfo": null,
        "showViewsCounter": false,
        "id": "00000000-0000-0000-0000-000000000001",
        "subscriptionLevel": {
            "ownerId": 00000000,
            "isArchived": false,
            "currencyPrices": {
                "USD": 4.6,
                "RUB": 400
            },
            "deleted": false,
            "createdAt": 1688285158,
            "name": "Tier1",
            "data": [{
                "rendition": "",
                "width": 1536,
                "url": "https://images.boosty.to/image/00000000-0000-0000-0000-000000000002?change_time=1234",
                "height": 1024,
                "type": "image",
                "id": "00000000-0000-0000-0000-000000000002"
            }, {
                "type": "text",
                "modificator": "",
                "content": "[\"• All posts works.\\n\",\"unstyled\",[[0,2,17]]]"
            }, {
                "content": "",
                "modificator": "BLOCK_END",
                "type": "text"
            }],
            "price": 400,
            "id": 1234567,
            "promos": []
        },
        "count": {
            "likes": 1,
            "reactions": {
                "laught": 0,
                "heart": 1,
                "angry": 0,
                "wonder": 0,
                "sad": 0,
                "fire": 0,
                "dislike": 0,
                "like": 0
            },
            "comments": 0
        },
        "hasAccess": true,
        "comments": {
            "extra": {
                "isLast": true,
                "isFirst": true
            },
            "data": []
        },
        "isPublished": true,
        "currencyPrices": {
            "USD": 0,
            "RUB": 0
        },
        "isRecord": false,
        "price": 0,
        "isLiked": false,
        "donators": {
            "data": [],
            "extra": {
                "isLast": true
            }
        },
        "int_id": 5817430,
        "isDeleted": false,
        "signedQuery": "",
        "isCommentsDenied": false,
        "publishTime": 1714404779,
        "user": {
            "hasAvatar": true,
            "avatarUrl": "https://images.boosty.to/user/00000000/avatar?change_time=1234",
            "blogUrl": "URL",
            "id": 00000000,
            "flags": {
                "showPostDonations": true
            },
            "name": "Name"
        },
        "title": "Some post title"
    },
    ...
}

If in response["extra"]["isLast"] == True - you get last page, that how you end traversal through posts.

So making it is pretty easy with something like httpx/request/etc. It just that gallery-dl is complicated and not have any type hints that would help understanding on what's going on with it.

I also have an active subscriptions here, but I never managed to write proper plugin, so I was just using some self-made solutions where I just put the cookies and user url.

@biggestsonicfan

mikf commented 3 weeks ago

@biggestsonicfan Boosty is on my current TODO list for the next release, so I'll at least try to get something done myself and/or support your endeavors at developing boosty extractors.

XCanG commented 3 weeks ago

@mikf if you need someone to test, I have active subscriptions on that platform, can help this way.

XCanG commented 3 weeks ago

Actually, I somehow didn't thought about it, but do you need the downloader script that I'm using? While it doesn't rely on gallery-dl at all, may be it would be helpful, since it's already written.

mikf commented 3 weeks ago

Sure, I guess it would be quite helpful and allow me to provide feature parity.

biggestsonicfan commented 3 weeks ago

@XCanG One of the issues boostydownloader was having before the repo was scrubbed off the face of the earth was the API has a hard limit of 300 posts. It will not go any further, regardless of how many more posts exist, which raises an interesting question: Should we be writing this around the API, or should we write this as a scrape?

XCanG commented 3 weeks ago

I posted my script here https://github.com/JumpJets/boosty_archiver


@biggestsonicfan I don't know what do you mean, first of all Boosty's pages aren't PHP, while they have initial script with some data, when you preloading posts, you actually hitting their API, there is no PHP pages, but API JSONs. If regular scroll through post wouldn't be able to show those posts, then nobody would be able to see them. And their posts is not even paginated, they used cursor. Second of all, I currently have a subscription to a user with > than 300 posts, image If you wondering how I counted, it's a simple CSS, that I was already used to track changes:

CSS for Stylish ```css @-moz-document domain("boosty.to") { [class^="Layout_content"] { width: unset !important; display: grid; justify-content: center; & [class^="Layout_threeColsCenter"] { width: unset !important; & [class^="Feed_feed"] { counter-reset: posts; > [class^="Feed_itemWrap"] { counter-increment: posts; &::before { content: "#" counter(posts); position: absolute; right: 100%; margin-right: 5px; font-size: calc(2vw + 1vh); font-weight: 100; font-family: sans; pointer-events: none; } > [class^="Post_root"] { width: unset !important; > [class^="Post_contentWrapper"] { &:has(> [class^="Post_readMore"]) > [class^="Post_content"] { max-height: unset !important; & > [class^="Post_shading"] { display: none !important; } } /* still needed for showing hidden stuff by JS > [class^="Post_readMore"] { display: none !important; }*/ } & .ce-block__content { max-width: unset !important; } } } } } & [class^="DialogueChat_dialog"] { width: 40vw; } > [class^="Settings_wrapper"] [class^="SettingsSubscriptions_cardsContainer"] { display: grid; grid-template-columns: repeat(auto-fill, minmax(240px, 1fr)); width: 70vw; gap: 20px; > [class^="SettingsSubscriptions_card"] { height: unset !important; width: unset !important; margin: 0 !important; display: grid; &:has([class^="WithdrawalInfo_root"]) { order: -1; } > div { height: unset !important; width: unset !important; margin: 0 !important; } } } } [class^="Layout_layout"] [class^="Post_root"] > [class^="Post_contentWrapper"] { &:has(> [class^="Post_readMore"]) > [class^="Post_content"] { max-height: unset !important; & > [class^="Post_shading"] { display: none !important; } } /* same > [class^="Post_readMore"] { display: none !important; }*/ } [class^="Post_container"] { width: 100% !important; } } ```

May be there is something that I don't know, but I don't saw this issue. You could try my script to see if it not works in your case.

biggestsonicfan commented 3 weeks ago

Boosty_archiver output: image Folder's file count: image Boosty page's media tab: image

🤷‍♂️

XCanG commented 3 weeks ago

This counter is posts counter, not attachments counter. The user I mentioned above have only 303 media, while amount of posts is larger than that. Apply my CSS (from spoiler) and scroll through posts until you reach latest one and tell me what post number it is. If you gonna paste it just on page, then remove outer brackets with @-moz-document domain() - those are Stylus addon-only.

XCanG commented 3 weeks ago

If you want to add media counter, it is possible to create separate progress for that only.

Actually, now that I'm think of it, I don't have proper coverage for videos, all my subscription, that I know post only images, archives and external links to Google Drive, Mega, etc.

Could you dump an example request that have a video in post data? In my case it's probably would be here: https://github.com/JumpJets/boosty_archiver/blob/main/boosty_archiver.py#L503 adding checks for "video", I assume, and then adding something like ctx.progress.print(d) and after that input() to stop execution (or just use debugging mode). I guess it will have some of the fields in similar way, like width, height, size. But then it may be differ as images here very different from other file attachments - those are used signedQuery, while images - not. And I'm not sure if it keep them intact or split into DASH/HLS or other streamable types. Even if it does not have extension like images, magic can detect videos with exception for av1. Should be able to do guess for first chunk like in the images if that the case.

biggestsonicfan commented 3 weeks ago

I definitely can do that. I'll also try to work out how to get this other error to you, but I'll switch this over to that repo.

mikf commented 2 weeks ago

I made an attempt: https://github.com/mikf/gallery-dl/commit/1ad58cab84a5ca550cb9f5f86538b24db83dd8dd

I'm not at all satisfied with how /USER/media/… results are pretty much incompatible with regular posts from /USER, but I doubt there is much that can be done in this regard.

Also, the code is currently ignoring signedQuery values. Is it really as simple as url + signedQuery or is something more complex necessary to combine the two?

edit: Can anyone provide an example of a free audio post? Audio files aren't handled yet, but everything I found needed a subscription to access.

XCanG commented 2 weeks ago

I don't have anyone who post audio, same with video.

Most of subscriptions I ever know have images in post, smaller part of them have links to Google Drive, Mega, etc. I know one creator that posted live streams from time to time, it is marked as special pinned post and have specific URL to that https://boosty.to/kazami_kapra/streams/video_stream - you will not able to get info about it right now, because stream is ended, until new one will be planned, can't get any info.

Also, the code is currently ignoring signedQuery values. Is it really as simple as url + signedQuery or is something more complex necessary to combine the two?

It is almost simple, yes. URL have full path to file, while signedQuery look like ?user_id=0000000&content_id=00000000-0000-0000-0000-0000000000000&expire_time=0000000000&sign=0000000000000000000000000000000000000000000000000000000000000000 however if it empty it = "", but you also want to add is_migrated=true which would require to join with & with SQ or ? without.

I was hoping, that you use user posts and parse user text as well. You can see in my example how I parsed it, it's tricky. But still many creators left external links in their posts and it would be useful to parse them in order to possibly export.

XCanG commented 2 weeks ago

I am thinking of a solution to video and audio. I may start Boosty creator account for testing purposes and create various type of content in order to parse it from API.

mikf commented 2 weeks ago

I was hoping, that you use user posts and parse user text as well. You can see in my example how I parsed it, it's tricky. But still many creators left external links in their posts and it would be useful to parse them in order to possibly export.

There are links and content metadata fields for external links and text content.

XCanG commented 2 weeks ago

Yes, but creators I use post password near links as well. At least one creator that require getting password along with link. Like

https://...
Password: 123
mikf commented 2 weeks ago

Subscribed user lists (https://github.com/mikf/gallery-dl/issues/2387#issue-1165739661) and homepage feed are now supported (https://github.com/mikf/gallery-dl/commit/274d99e7d68a5cc78057d9ea31a6581f81267080)

Yes, but creators I use post password near links as well. At least one creator that require getting password along with link.

Sorry, I made a typo that made it ignore all external links (https://github.com/mikf/gallery-dl/commit/ee8c4e2e49468b2a45fa3fb1830b42ee6a4f8978). post[content] should now contain all text content, including links and passwords.

XCanG commented 2 weeks ago

Ok, so I setup creator account and added various type of content on it for testing platform, can add anything else if needed.

https://boosty.to/xcang

And for testing I also added link to free subscription, so you can test it out on posts behind a tier. If needed more, I can post as well. https://boosty.to/xcang/subscription-level/2984885/promo/60759?linkId=af4c8bb8163058a2cdd1e603989c000e

@mikf

XCanG commented 2 weeks ago

By the way, looking at your code, I don't quite see how do you detect image extension? In case of Boosty it doesn't show it either in API or in URL, so what I did is feed first chunk from stream to magic to detect proper extension. From my code it this lines.

mikf commented 2 weeks ago

Thank you very much!

I took a look at the audio post to try to figure out how to handle these, but no luck. The API provides an URL with no query parameters:

https://cdn.boosty.to/audio/b46f05b1-c6d1-4a10-8791-6a65077aa1c2

while the actual URL used by the website has plenty of them:

https://cdn.boosty.to/audio/b46f05b1-c6d1-4a10-8791-6a65077aa1c2?user_id=33020532&content_id=5d4d6f90-5d48-4442-a7e5-2164a858681d&expire_time=1728110163&sign=aa922feeffdcf64120d9c07b13bbe8ce99f6fc28d263d22601a4fe53a6438ead&is_migrated=true

Most are trivial to add, but I have no idea how to generate a sign value and the file won't download without.

free subscription

Still requires a credit card, and I do not own one.

By the way, looking at your code, I don't quite see how do you detect image extension?

This gets handled by the http downloader module, which basically does the same thing as you described but without magic.

https://github.com/mikf/gallery-dl/blob/5b968a0a7cb5845cc1c2e0c615941e7060d914b9/gallery_dl/downloader/http.py#L246-L248 https://github.com/mikf/gallery-dl/blob/5b968a0a7cb5845cc1c2e0c615941e7060d914b9/gallery_dl/downloader/http.py#L369-L377

XCanG commented 2 weeks ago

while the actual URL used by the website has plenty of them

This is actually what signedQuery is, you have to get it from a post. Look at my file downloading as example to how I get this parameter. I'm currently at work, can't properly quote from phone.

XCanG commented 2 weeks ago

I've added support for video and audio files in my version, if you still have questions regarding signedQuery, can check my version.

What is not perfect here is quality selector and support for hls streams or streams in general.

For example, video file that I uploaded have this metadata:

General
Format                                   : MPEG-4
Format profile                           : Base Media
Codec ID                                 : isom (isom/iso2/avc1/mp41)
File size                                : 4.21 MiB
Duration                                 : 34 s 550 ms
Overall bit rate mode                    : Variable
Overall bit rate                         : 1 023 kb/s
Writing application                      : Lavf61.1.100

Video
ID                                       : 1
Format                                   : AVC
Format/Info                              : Advanced Video Codec
Format profile                           : Main@L3.1
Format settings                          : CABAC / 4 Ref Frames
Format settings, CABAC                   : Yes
Format settings, Reference frames        : 4 frames
Codec ID                                 : avc1
Codec ID/Info                            : Advanced Video Coding
Duration                                 : 34 s 550 ms
Bit rate mode                            : Variable
Bit rate                                 : 976 kb/s
Maximum bit rate                         : 11.2 Mb/s
Width                                    : 804 pixels
Height                                   : 512 pixels
Display aspect ratio                     : 16:10
Frame rate mode                          : Constant
Frame rate                               : 60.000 FPS
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : Progressive
Bits/(Pixel*Frame)                       : 0.040
Stream size                              : 4.02 MiB (95%)
Color range                              : Full
colour_range_Original                    : Limited
Color primaries                          : BT.709
Transfer characteristics                 : sRGB/sYCC
Matrix coefficients                      : Identity
matrix_coefficients_Original             : BT.470 System B/G
Codec configuration box                  : avcC

Audio
ID                                       : 2
Format                                   : Opus
Codec ID                                 : Opus
Duration                                 : 34 s 545 ms
Source duration                          : 34 s 552 ms
Bit rate mode                            : Variable
Bit rate                                 : 33.9 kb/s
Maximum bit rate                         : 128 kb/s
Compression mode                         : Lossy
Stream size                              : 143 KiB (3%)
Source stream size                       : 143 KiB (3%)
Default                                  : Yes
Alternate group                          : 1

but the best quality I find is worse version of this file and dimensions are also scaled down:

General
Complete name                            : 6949212_Video test_0_d0e4dd58-d7bb-4178-b503-1cd804d20c62.medium.mp4
Format                                   : MPEG-4
Format profile                           : Base Media
Codec ID                                 : isom (isom/iso2/avc1/mp41)
File size                                : 1.57 MiB
Duration                                 : 34 s 560 ms
Overall bit rate                         : 380 kb/s
Writing application                      : Lavf58.76.100

Video
ID                                       : 1
Format                                   : AVC
Format/Info                              : Advanced Video Codec
Format profile                           : Main@L3.1
Format settings                          : CABAC / 4 Ref Frames
Format settings, CABAC                   : Yes
Format settings, Reference frames        : 4 frames
Codec ID                                 : avc1
Codec ID/Info                            : Advanced Video Coding
Duration                                 : 34 s 456 ms
Bit rate                                 : 240 kb/s
Width                                    : 754 pixels
Height                                   : 480 pixels
Display aspect ratio                     : 16:10
Frame rate mode                          : Variable
Frame rate                               : 30.067 FPS
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : Progressive
Bits/(Pixel*Frame)                       : 0.022
Stream size                              : 1 009 KiB (63%)
Tagged date                              : UTC 2024-10-03 18:25:33
Color range                              : Limited
Color primaries                          : BT.709
Transfer characteristics                 : BT.709
Matrix coefficients                      : BT.709
Codec configuration box                  : avcC

Audio
ID                                       : 2
Format                                   : AAC LC
Format/Info                              : Advanced Audio Codec Low Complexity
Codec ID                                 : mp4a-40-2
Duration                                 : 34 s 560 ms
Bit rate mode                            : Constant
Bit rate                                 : 132 kb/s
Channel(s)                               : 2 channels
Channel layout                           : L R
Sampling rate                            : 48.0 kHz
Frame rate                               : 46.875 FPS (1024 SPF)
Compression mode                         : Lossy
Stream size                              : 553 KiB (34%)
Default                                  : Yes
Alternate group                          : 1
Tagged date                              : UTC 2024-10-03 18:25:33

So there is still some questions about their API for videos, as with audio it exact the same file, so no questions here.

XCanG commented 2 weeks ago

Oh and I find an error in your latest commit https://github.com/mikf/gallery-dl/commit/3fa639fc2d7faea694d741c74dd955f73415cd84#diff-d250fdbc500177122ea0f4a11b12ec12965c072bf580ec3e112c2c33290ee9a2R119 signedQuery not present when you aren't signed in, so the check should be first if "signedQuery" in post and post["signedQuery"]: or just if post.get("signedQuery"):