nadimkobeissi / mkbsd

Download all the wallpapers in MKBHD's "Panels" app
Do What The F*ck You Want To Public License
3.65k stars 448 forks source link

List Bucket Is Also Enabled/ Last Collection Encrypted #8

Open markwinap opened 1 month ago

markwinap commented 1 month ago

Bucket content is also available. https://storage.googleapis.com/panels-api/. However, it looks like the content of the latest collection is encrypted ( content created 2 hours ago) _https://storage.googleapis.com/panels-api/data/20240924/media-1a-i-t~s

Also other useful APIs

Folders/Artis/Wallpapers

_https://storage.googleapis.com/panels-api/data/20240924/content-1a

Photo Metadata

kylefmohr commented 1 month ago

There are also these open buckets:

https://storage.googleapis.com/panels-static/

https://storage.googleapis.com/panels-cdn/

nor0x commented 1 month ago

seems that the cdn bucket has a list to higher resolutions: image

downloaded with this script https://gist.github.com/nor0x/5a906e2fd28aa3a202a4565bc3366646

kylefmohr commented 1 month ago

seems that the cdn bucket has a list to higher resolutions: image

downloaded with this script https://gist.github.com/nor0x/5a906e2fd28aa3a202a4565bc3366646

Good find!

I converted your script to Python

Python Script

```python import json import os import requests from concurrent.futures import ThreadPoolExecutor all_url = "https://storage.googleapis.com/panels-cdn/data/20240730/all.json" response = requests.get(all_url) json_data = response.json() urls = [] def extract_urls(element): if isinstance(element, dict): for key, value in element.items(): if key == "url": urls.append(value) else: extract_urls(value) elif isinstance(element, list): for item in element: extract_urls(item) extract_urls(json_data) print(f"found {len(urls)} urls") if not os.path.exists("downloads"): os.makedirs("downloads") def download_file(url): file_name = os.path.basename(url) file_path = os.path.join("downloads", file_name) if not os.path.exists(file_path): print(f"downloading {url}") response = requests.get(url, stream=True) with open(file_path, "wb") as f: for chunk in response.iter_content(chunk_size=8192): f.write(chunk) else: print(f"skipping {url}") with ThreadPoolExecutor(max_workers=10) as executor: executor.map(download_file, urls) ```

This generates over 10GB of pictures, 2331 in total

nor0x commented 1 month ago

there is also a different snapshot at https://storage.googleapis.com/panels-cdn/data/20240606/all.default.json contains some differences in images

nadimkobeissi commented 1 month ago

You guys are awesome, will look into integrating these findings into the repo soon

nor0x commented 1 month ago

here is a (crapy slow) list of all images so far https://nor0x.github.io/OpenPanels

kylefmohr commented 1 month ago

just putting this here for informational purposes: After downloading all of the additional photos from all.default.json, I ran a perceptual hashing script across the set of photos, and am now left with 1,477 photos, which means there are likely duplicates in both json files. There is probably a less computationally expensive way to do this, but this is what I ran:

Python Code

```python import os import imagehash from PIL import Image import subprocess def find_duplicate_images(directory, hash_size=8, threshold=2): """ Finds duplicate images in a directory based on perceptual hashes. Args: directory: The directory containing the images. hash_size: The size of the hash to generate (higher values are more precise but slower). threshold: The maximum difference in hash values for images to be considered duplicates. Returns: A list of tuples, where each tuple contains the paths of two duplicate images. """ image_hashes = {} duplicates = [] for filename in os.listdir(directory): filepath = os.path.join(directory, filename) if filename.endswith(('.jpg', '.jpeg', '.png')): try: with Image.open(filepath) as img: # Generate perceptual hash using average hash hash = imagehash.average_hash(img, hash_size=hash_size) # Other hash functions can be used instead: # hash = imagehash.phash(img, hash_size=hash_size) # hash = imagehash.dhash(img, hash_size=hash_size) # hash = imagehash.whash(img, hash_size=hash_size) # Check if hash is already in dictionary for existing_hash, existing_path in image_hashes.items(): if hash - existing_hash <= threshold: duplicates.append((filepath, existing_path)) break else: image_hashes[hash] = filepath except Exception as e: print(f"Error processing {filename}: {e}") return duplicates directory_path = "downloads/" duplicate_images = find_duplicate_images(directory_path) if duplicate_images: print("Duplicate images found:") for image1, image2 in duplicate_images: print(f" - {image1} is a duplicate of {image2}") try: image1_size = os.path.getsize(image1) image2_size = os.path.getsize(image2) if image1_size < image2_size: os.remove(image1) else: os.remove(image2) except: pass else: print("No duplicate images found.") ```

axsddlr commented 1 month ago

less computationally expensive way to do this

maybe using imagehash.phash() instead ?