Closed AFGreeneye closed 1 week ago
There is no Headless Browser, only reverse engineering (if one can call it that) how a site works by inspecting its traffic using a network monitor and HTML and JS source if necessary. gallery-dl replicates the needed HTTP requests and extracts data to collect and build download URLs.
For nsfwalbum search results, you'd need to request https://nsfwalbum.com/backend.php?queryString=search=QUERY&prev_items=4&p=PAGE
and collect the returned album IDs (href="/album/12345"
), it seems.
My project works fine, but the only problem I have is with something called 'spirit' to save the JPG file
https://nsfwalbum.com/backend.php?&spirit=g6z27zb4zc6zb7z%605ze5z5dz&photo=85443691
The variable called 'spirit' in the JavaScript code of the page holds the spirit's value. The problem is that I tried everything to extract the value with 'Axios' and 'Cheerio', but it does not work! The funny thing is that you can easily see the spirit's value by using 'console.log(spirit)
' in the browser.
var spirit = encodeURIComponent(giraffe.annihilate("1e|4c|4e|0e|2e|2e|6a|7a|", 6));
from https://nsfwalbum.com/iframe_image.php?id=12345
var giraffe={annihilate:function(r,a){var n="";r.toString();for(var t=0;t<r.length;t++){var e=r.charCodeAt(t)^a;n+=String.fromCharCode(e)}return n}}
from https://nsfwalbum.com/js/my.js
Translating this to Python gets you https://github.com/mikf/gallery-dl/blob/f58b0e6fc7972e1432fa7032afddfb108802a8a1/gallery_dl/extractor/nsfwalbum.py#L52-L54 and https://github.com/mikf/gallery-dl/blob/f58b0e6fc7972e1432fa7032afddfb108802a8a1/gallery_dl/extractor/nsfwalbum.py#L79-L83
Thank you so much!
import axios from 'axios';
async function fetchSpiritValue() {
const url = 'https://nsfwalbum.com/photo/85440023';
try {
// Make GET request to the URL
const response = await axios.get(url);
// Extract the part between 'giraffe.annihilate("' and '"'
const startIndex = response.data.indexOf('giraffe.annihilate("');
const endIndex = response.data.indexOf('"', startIndex + 'giraffe.annihilate('.length + 1);
const extractedString = response.data.substring(startIndex + 'giraffe.annihilate('.length, endIndex);
// Define the equivalent of _annihilate function
function _annihilate(value: string, base: number = 6): string {
let result = '';
for (let i = 0; i < value.length; i++) {
const charCode = value.charCodeAt(i) ^ base;
result += String.fromCharCode(charCode);
}
return result;
}
// Apply _annihilate to the extracted string
let spirit = _annihilate(extractedString);
// Replace special characters if needed
spirit = spirit.replace(/`/g, '%60'); // Replace backtick (`) with %60
// Encode the spirit value to ensure proper URL encoding
spirit = encodeURIComponent(spirit);
// Check and remove leading %24 if present
if (spirit.startsWith('%24')) {
spirit = spirit.substring(3); // Remove the first 3 characters (%24)
}
console.log('Spirit value:', spirit);
} catch (error: any) {
console.error('Error fetching spirit value:', (error as Error).message);
}
}
fetchSpiritValue();
Hi! I'm not sure if this is the right place to ask my question, but I really want to understand how Gallery-dl works. I'm currently studying frontend development, and I'm quite proficient with JavaScript/TypeScript (though I'm still a newbie!). Lately, scraping data from the internet has become my new hobby. Today, I came across the website 'nsfwalbum.com'. I tried to download all albums of a model, but Gallery-dl only accepts the URL of a single album.
I could create a list of albums I want to download and put them in a '.bat' file to run it, but that wouldn't be convenient and would take a lot of time. So, I thought about making my own picture downloader app just for fun. Since 'nsfwalbum.com' uses Dynamic Content Loading, I had to use Puppeteer (a Headless Browser) in my project, which works fine!
However, when I compare my project with Gallery-dl, mine would never be able to compete in terms of speed and efficiency. Unfortunately, I'm new to Python and I looked at 'nsfwalbum extractor code but barely understand the syntaxes. I really want to know how Gallery-dl downloads pictures so quickly. Does it use a Headless Browser or something similar? I wish someone could explain it to me.