mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.33k stars 924 forks source link

[Request] VKontakte (vk.com) #474

Open photologist opened 4 years ago

photologist commented 4 years ago

VKontakte (vk.com) extractor(s) would be much appreciated.

Profiles have photo albums...

eg: https://m.vk.com/photos2830256 (album)

Album thumbs either open a frame or link to individual pages...

eg: https://m.vk.com/photo2830256_457248652?list=photos2830256

On individual photo pages there's an option to ‘Download full size’...

eg: https://sun9-5.userapi.com/c855416/v855416689/142081/zu0CJ7Su5KY.jpg

Hope that helps, thanks so much for all the work!

viceroycowboy commented 3 years ago

still waiting

github-userx commented 3 years ago

I second this. @mikf

photologist commented 3 years ago

Any updates on this one?

ImVantexHD commented 3 years ago

Any updates on this one?

come back in two years, maybe will be done by then

Twi-Hard commented 3 years ago

I really want this too

photologist commented 3 years ago

come back in two years...

Ok, but only if you're still here.

mikf commented 3 years ago

@photologist https://github.com/mikf/gallery-dl/commit/62cfee4d28a4bceba3b9028879c3144186812a51, but it has many shortcomings.

It cannot handle albums with >100 photos yet, available metadata is minimal, and I'm not sure about the correct URL format for original files.

I think to get the original URL for

https://sun9-20.userapi.com/impf/c836729/v836729326/1f25a/N3g5QzPZBbM.jpg?size=800x800&quality=96&sign=06bcfc21a2980b0ff1f59129a25c0ceb&type=album

you remove all query parameters (everything after and including ?) and /impf:

https://sun9-20.userapi.com/c836729/v836729326/1f25a/N3g5QzPZBbM.jpg

but the initial URL has double the file size and probably higher quality. And then there are also URLs with /impg, but that doesn't seem to make a difference.

photologist commented 3 years ago

@mikf, would this help?

VKontakteRu.txt

It's a decrypter used for JDownloader 2 with a handful of patterns you might find useful.

It works exceptionally well at helping to grab the original file sizes.

photologist commented 3 years ago

@mikf, as of now, the extractor only works for the test account used in vk.py.

Is there a way to make it work for other accounts? Every time I try, I get the "No suitable extractor" error.

mikf commented 3 years ago

You need an URL that contains the user ID. Either as https://vk.com/id123456789 or, for accounts with no ID in their profile URL like https://vk.com/vova.pa1n, you can use the URL of the photos/albums page https://vk.com/albums276505347?profile=1

photologist commented 3 years ago

Thanks, @mikf! The method you provided works as indicated.

Is there a way to download one or some albums and not others?

loyalskyr commented 3 years ago

Will the app be able to download Pages/Communities albums? It should be the biggest addon for VK.

mikf commented 3 years ago

@photologist not yet, it currently can only download all photos from a user @loyalskyr could you post an example of what these would look like?

loyalskyr commented 3 years ago

@mikf Sure. COMMUNITY/PAGE Community Wall Photos(album)

toscompliantname commented 2 years ago

Now let me tell you why this is important. https://vk.com/animelrl https://vk.com/msinsanity https://vk.com/2d_despair imaeg

Roccobot commented 1 year ago

I guess the partially-done status implies there's no way to make gallery-dl login into your VK account?

mikf commented 1 year ago

@Roccobot There is no native login support, but you can always use cookies.

ermnick commented 10 months ago

@mikf Where does the value for the date key come from? It's not the correct date of the post I'm trying to download and it doesn't return as a consistent format when viewing it with the -K parameter:

<span class="rel_date">today at 1:44 am</span>
<span class="rel_date rel_date_needs_update" abs_time="today at 1:44 am" time="1699915470">four hours ago</span>

I'm trying to set the mtime for my downloads some other way, since the Last-Modified header seems unreliable.

estatistics commented 7 months ago

my script to download all community photos from VK

PLEASE before use it for vk communities with a lot of photos, try to use it for a small set eg. create a main_example.html that will contain only 5-10 urls for testing purposes or find a vk community with very few photos.

NOTE: url_links.log is your high-res image archive. You may decide what to do with this. NOTE: you may tweak gallery-dl options as you wish. I have limited download rate for obvious reasons (eg. blocking). Note: If you wish to write metadata or comments or tags, you must do that on small images url. This is not implemented in that script (yet). Full-res images dont contain such information.

Note: you must change the path of your firefox profile as well of your USER agent.

SCRIPT UPDATED cause a lot of login pages produced. Now, i increased the wait time as well as an if--else loop added to delete login pages and wait 30m before try again UPDATED No2 grep was not safe with strings starting from "-". it produced error. ". Added "--" to make it safer for these cases. Also, number of line number is printed and logged in order to know script progress (only for high-res image urls). Some tidy up of code. More logs in order to know what happens inside the script from terminal.

SCRIPT UPDATE for working with mobile version of site I have deactivated sleep for finding login page. if you encounter this problem please, activate it, uncomment the related line/s. notfound.log try to archive image urls that were not found in downloaded related files for some reason.

PLEASE be ensure that you keep the vk community page open in the related browser. This may keep cookies active.

#!/bin/bash

# you must HAVE completed load (ALL images eg. 5000 images)
# and saved the whole web page of VK COMMUNITY of interest as main.html
#then use THIS script.

#SET firefox profile PATH BEFORE USE
browser_profile="/home/USER/.mozilla/firefox/profiledefault-esr-2";

#SET PATH BEFORE USE - must be same downloading directory / log directory
cd "/mnt/home/USER/downloads/VK/YoungFolks/";

#SET PATH BEFORE USE - must be same dwnloading directory / log directory
LN="/mnt/home/USER/downloads/VK/"url_links.log;

# using grep, we  extract all mini image links
grep -Eoh "https:\/\/(m\.)?vk.com\/[_0-9a-zA-Z-]++(\?rev\=1)?"  main.html   |  grep "photo-.*"  | sort | uniq > url_links.log;

#downloading the webpages of the small images in order to get access to the high-res image
while IFS='' read -r LINE || [ -n "${LINE}"  ]; do
    if (curl -s ifconfig.co > /dev/null;) then
# then gallery-dl will download the web page of each mini vk image
        echo "--line--:${LINE}";
        gallery-dl -vvv --retries -1 --sleep 2-5.2 --write-pages  --limit-rate 1000K --config gallerydl.conf  --cookies-from-browser firefox:"$browser_profile"  ${LINE};

        # checking if a login page was found in downloading txt files
        vklogin_url="$(ls | grep login | wc -l)"
        if (($vklogin_url > 0));
        then
            rm *login*;
            find .  -size 0c -delete;
            echo "a login url found at least-it is deleted ";
            # sleep 30m;
            # gallery-dl -vvv --retries -1 --sleep 2-5.2 --write-pages --limit-rate 500K --config gallerydl.conf  --cookies-from-browser firefox:"$browser_profile"  ${LINE};
        else
            echo "OK";
        fi;

    else
        echo "sleeping"; sleep 10
    fi

rm *al_photos.php.*;

done < "${LN}"

# Extracting the unique strings which is part of image file name, in order to use it later to find the high-res image
grep -oh "imp.\/[-0-9a-zA-Z_\/]\+\+.jpg" main.html | sort | uniq  > img_names.txt;
# Removing some things
cat img_names.txt | sed "s/imp.\///" | sed "s/\.jpg//" > img_string.txt;

length_string="$(cat img_string.txt | wc -l)";
# performing a check how many image strings we have
echo "number of image strings is: $length_string";

grep -oh "[https]\+\+:[-0-9a-zA-Z\.\?&=;\\\/_]\+\+album" *.txt  >albs.log;
cat albs.log | sort | uniq > ualbs.log;
sed -i "s/amp;//g" ualbs.log ;
sed -i "s/\\\//g" ualbs.log ;

i=0; while ((i++)); read -r line; do

grep -oh -- "[https:]\+\+""[-0-9a-zA-Z\.\\\/\?\&\=;]\+\+"$line"[-0-9a-zA-Z\.\\\/\?\&\=;]\+\+" ualbs.log > img_highres.log;

# Checking if the image code as part of url exists in related files
length_res=0;
if [ "$(cat img_highres.log | wc -l)" = "length_res" ]; then
echo "ZERO";
$line >> notfound.log;
else
echo "ok-""$(cat img_highres.log | wc -l)";

# Getting all related image size urls
cat img_highres.log | grep -oh "size=[0-9x]\+\+" | sed "s/size=//" | sed "s/x/+/" > img_size.log;

# here by adding the dimensions of each image, we can find the highest one eg. 1080+1000=2080 vs 720+1080=1800
cat img_size.log | bc > img_big_numbers.log;
paste -d"," img_size.log img_big_numbers.log > img_sums.log;
cat img_sums.log | sort --field-separator="," -k2 -n -r | head -n1 | sed "s/+/x/" | grep -o "[0-9x]\+\+,"| sed 's/.$//' > img_size_ok.log;
highest_im_url="$(grep -F "$(cat img_size_ok.log)" img_highres.log | head -n1)";

# The highest high-res image link
echo $highest_im_url >> highest_im_url.log;
echo "url to download: $highest_im_url"
gallery-dl -v --retries -1 --sleep 2-5.2  --limit-rate 400K --config gallerydl.conf  --download-archive archive.log --cookies-from-browser firefox:"$browser_profile" $highest_im_url

fi;

echo "line is $i";
done < img_string.txt

my gallerydl.conf

{
    "extractor":
    {
        "base-directory": "./gallery-dl/",
        "parent-directory": false,
        "postprocessors": null,
        "archive": null,
        "cookies": null,
        "cookies-update": true,
        "proxy": null,
        "skip": true,

        "user-agent": "Mozilla/5.0 (X11; Linux x86_64; rv:120.0) Gecko/20100101 Firefox/120.0",
        "retries": 4,
        "timeout": 30.0,
        "verify": true,
        "fallback": true,

        "sleep": 0,
        "sleep-request": 0,
        "sleep-extractor": 0,

    "vk":
        {
            "#": "authentication with cookies is not possible for sankaku"
        }
}

}
Roccobot commented 7 months ago

@ermnick how do I use that?

Kellenok commented 6 months ago

Can someone fix the order of downloading photos in VK? The default link vk.com/album-118749005_00 shows photos from newest to oldest, while the gallery downloads them from oldest to newest. However, if I add ?rev=1 to the end of the link, VK shows photos in reverse order. But gallery-dl doesn't support this type of urls. It would be nice to have the option to choose the order, like in Instagram order-files.

estatistics commented 6 months ago

UPDATE: I went to a community wall with ~92000 photos in it. I succeeded to load all photos - cause vk can grap "mouse/key sequence" and then stop loading more photos. Photos that were loaded in community wall was ~32000 and not 92000. By pressing in the last photo, i can continue to view next ones photos. By reversing the order of photos, you reverse the order of ~32000 photos and not showing the TRUE LAST photo of ~92000. I havent found any remedy on that.

This is bug of vk.

ps. best results with m.vk.com and not vk.com

estatistics commented 6 months ago

UPDATE For anyone that would like to download Community photos in VK that exceed 32000 unofficial (?) limit of vk for appearing in its wall, then do this (you must dedicate a computer to this):

  1. Download Chromium
  2. Install image Handling Shortcuts (add on). This can copy image urls by pressing some keys. Very useful.
  3. go to /home/USER/.config/chromium/Default/Extensions/ (for linux) and find the extention folder. It may given some arbitrary string.
  4. You MUST NOT use Chromium for other uses cause may you have trouble by doing the next step.
  5. Replace index.js code with following code in order to accept "simpler" keystrokes, namely only "c".
let cursorX = 0
let cursorY = 0

document.onmousemove = e => {
    cursorX = e.clientX;
    cursorY = e.clientY;
};

const getNearestImage = () => document.elementFromPoint(cursorX, cursorY);

// force downloading
function forceDownload(blob, filename) {
    var a = document.createElement('a');
    a.download = filename;
    a.href = blob;
    // For Firefox https://stackoverflow.com/a/32226068
    document.body.appendChild(a);
    a.click();
    a.remove();
}

// Current blob size limit is around 500MB for browsers
async function downloadResource(url, filename) {
    if (!filename) filename = url.replace(/^.*[\\\/]/, '');
    try {
        const response = await fetch(url, {
            headers: new Headers({
                'Origin': location.origin
            }),
            mode: 'cors'
        });
        const blob = await response.blob();

        const blobUrl = window.URL.createObjectURL(blob);
        forceDownload(blobUrl, filename);
    }
    catch (err) {
        alert("Forbidden Image, please download it MANUALLY");
        console.error(e);
    }
}
// END force downloading

const saveImageAs = () => {
    const img = getNearestImage();
    const src = img.getAttribute('src');
    downloadResource(src);
};

const copyImageAddress = () => {
    try {
        const img = getNearestImage();
        const src = img.getAttribute('src');
        const filename = 'logs.txt';
        // alert(src);

        const copyText = document.createElement("input");
        copyText.setAttribute('value', src);
        document.body.appendChild(copyText);

        copyText.select();
        copyText.setSelectionRange(0, 99999);
        document.execCommand("copy");

        document.body.removeChild(copyText);
        // Todo add little sound to notify user
    } catch (err) {
        window.prompt("Copy to clipboard: Ctrl+C, Enter", src);
    }
};

window.onkeydown = e => {
    const key = e.key.toLowerCase();
    // const firstPart = e.shiftKey && (e.ctrlKey || e.metaKey);
    if (key === 'c') copyImageAddress();

  // if (firstPart && key === 's') saveImageAs();
 //   else if (firstPart && key === 'c') copyImageAddress();

};
  1. Next, close chromium, and reopen it for changes happen
  2. then in terminal sudo apt installl xclip xdotool to simulate mouse movements and key pressings

The following code, save the copied content of clipboard in a file. If something is copied twice, it doesnt save it into the file. In our case, the image urls of vk.

i=1; while true; do
clip_cont="$(xclip -selection clipboard -o )";

if grep -oqF --  "${clip_cont}" clip.txt  2>/dev/null;
then
sleep 1;
else
let "i=i+1";
echo "${i}""|""${clip_cont}" >> clip.txt;
echo "${i}"" | ""${clip_cont}";
fi;
done;

So far:

  1. computer must be dedicated to this, you can not do any other work as the script running cause simulate mouse and keyboard movements.
  2. You must have run and restarted chromium with image handler addon (you changed index.js).
  3. You have open a vk community page and click on its initial photo. No other tabs.
  4. you opened a terminal and run the clip code that copies clipboard content
  5. Then, you open another terminal, and the following code, will browse images seriaslly, and copy url into clipboard.
  6. Then the clip code will save it into the clip file.
  7. then run wget to download the image
  8. NOTE: you may change sleep times depending how slow/fast your machine is.
  9. In my machine, i "lost" 30 image urls in a total of 9000. It took about 5 hours to save 9000 image urls.
  10. Images are BROWSED pressing Left key. If you need Right key, you change it accordingly xdotool key Left; in every case.
  11. The simulated mousepointer must be at the center of image. Change accordingly the line xdotool mousemove 503 502; before xdotool key Left; in every case.
  12. If mouse pointer is not over the image, no image url is saved. Also, when image is not loaded for some reason.
# WARNING: only first 9000 images will be browsed. Otherwise. CHANGe the number. 

# 9000 itirations for browsing. 
 a=0;   while [ $a -lt 9000 ];   do
        #generate random number between 0-3 / usefull to choose a random case
        rnd_n0=$((RANDOM%4));

# here 4 cases specified / 4 random keystrokes / mousemovements, otherwise vk may stop load images.

case $rnd_n0 in
0)
WID=`xdotool search --name "Community" | head -1`;   echo $WID;
xdotool windowactivate $WID;
xdotool windowfocus $WID;
xdotool mousemove 502 502; sleep 0.01;
xdotool mousemove 503 502;
xdotool key Left;
xdotool key c c c c;
sleep 0.11;
rn0=$((RANDOM%10));rn1=$((RANDOM%10)); rn2=$((RANDOM%10)); xdotool mousemove "8"$rn1$rn2 "2"$rn1$rn2; sleep 0.02;
rn0=$((RANDOM%10));rn1=$((RANDOM%10)); rn2=$((RANDOM%10)); xdotool mousemove "7"$rn1$rn2 "1"$rn1$rn2; sleep 0.03;
rn0=$((RANDOM%10));rn1=$((RANDOM%10)); rn2=$((RANDOM%10)); xdotool mousemove "6"$rn1$rn2 "5"$rn1$rn2;  sleep 0.02;
rn0=$((RANDOM%10));rn1=$((RANDOM%10)); rn2=$((RANDOM%10)); xdotool mousemove "6"$rn1$rn2 "4"$rn1$rn2;  sleep 0.01
;;
1)

WID=`xdotool search --name "Community" | head -1`;   echo $WID;
xdotool windowactivate $WID;
xdotool windowfocus $WID;
xdotool mousemove 503 506;  sleep 0.02;
xdotool mousemove 500 503;
xdotool key Left;
xdotool key c c c c c c;
sleep 0.22;

rn0=$((RANDOM%10));rn1=$((RANDOM%10)); rn2=$((RANDOM%10)); xdotool mousemove "2"$rn1$rn2 "3"$rn1$rn2;  sleep 0.02;
rn0=$((RANDOM%10));rn1=$((RANDOM%10)); rn2=$((RANDOM%10)); xdotool mousemove "2"$rn1$rn2 "3"$rn1$rn2;  sleep 0.03;
rn0=$((RANDOM%10));rn1=$((RANDOM%10)); rn2=$((RANDOM%10)); xdotool mousemove "1"$rn1$rn2 "1"$rn1$rn2;  sleep 0.01
 ;;
2)
WID=`xdotool search --name "Community" | head -1`;   echo $WID;
xdotool windowactivate $WID;
xdotool windowfocus $WID;
xdotool mousemove 503 502; sleep 0.01;
xdotool mousemove 505 501;
xdotool key Left;
xdotool key c c c c;
sleep 0.23;

rn0=$((RANDOM%10));rn1=$((RANDOM%10)); rn2=$((RANDOM%10)); xdotool mousemove "9"$rn1$rn2 "7"$rn1$rn2;  sleep 0.01;
rn0=$((RANDOM%10));rn1=$((RANDOM%10)); rn2=$((RANDOM%10)); xdotool mousemove "9"$rn1$rn2 "8"$rn1$rn2;  sleep 0.03
;;
3)
WID=`xdotool search --name "Community" | head -1`;   echo $WID;
xdotool windowactivate $WID;
xdotool windowfocus $WID;
xdotool mousemove 501 503; sleep 0.01;
xdotool mousemove 505 501;
xdotool key Left;
xdotool key c c c c c;
sleep 0.11;

rn0=$((RANDOM%10));rn1=$((RANDOM%10)); rn2=$((RANDOM%10)); xdotool mousemove "4"$rn1$rn2 "0"$rn1$rn2;  sleep 0.01;
rn0=$((RANDOM%10));rn1=$((RANDOM%10)); rn2=$((RANDOM%10)); xdotool mousemove "4"$rn1$rn2 "1"$rn1$rn2; sleep 0.01;
rn0=$((RANDOM%10));rn1=$((RANDOM%10)); rn2=$((RANDOM%10)); xdotool mousemove "3"$rn1$rn2 "1"$rn1$rn2; sleep 0.02;
rn0=$((RANDOM%10));rn1=$((RANDOM%10)); rn2=$((RANDOM%10)); xdotool mousemove "3"$rn1$rn2 "2"$rn1$rn2;  sleep 0.01;
rn0=$((RANDOM%10));rn1=$((RANDOM%10)); rn2=$((RANDOM%10)); xdotool mousemove "3"$rn1$rn2 "7"$rn1$rn2;  sleep 0.03
;;
*)

#esac is the closing argument for case (reversed wording). 
            esac

            #wait 0.01 seconds until the next iteration
            sleep 2.00;
            a=$((a+1));
            echo $a;

done;

# Write here MANUALLY the last URL image to continue the process another time if you stopt it.
# LAST image url of browser (not the final long one):