Closed josephrcox closed 1 year ago
Back with some input, I ran the program with the detailed_logs
option turned on, couldn't get any info. However, reddit seems to have a possible solution for this. I do not know if this is what you are currently working on. Possible solution depends on FFMpeg which will be a hard thing to implement in node I believe (Comment has solution for python, the basic idea is there. Download the audio and video parts separately and later join them with FFMpeg [if that is possible in node. There's an NPM module available, but I have no experience with node.] assuming that Reddit hasn't changed the way videos are handled.). https://old.reddit.com/r/redditdev/comments/9a16fv/videos_downloading_without_sound/ed8us5e/
Sorry to revive a silent thread after a while, but @josephrcox how is the Video+Audio download operation handled? I'm looking into messing around with JavaScript. I did see that the program is attempting to use the Fallback URL method for the video part but nothing for the audio. Some people say that YT-DLP has support for Reddit posts and it did work on a simple test I did. There's a wrapper for it in NPM but I assume that this project would be aiming for all self-written code, but how does this sound as a quick fix? Perhaps there can be a switch in the user-config.js
file that toggles the use of YT-DLP for this and a self-written approach to video downloading.
(Source for the YT-DL(P) thing I found https://old.reddit.com/r/redditdev/comments/6mr7oi/how_to_download_a_video_hosted_on_reddit/dnu47xw/ )
So uhh, I tried to work on this idea a bit by myself. It works slightly, some very useful stuff this program has do not work (files are not stored where they should be [just the videos], progress reporting is a little broken) due to my spaghetti-tier code, but I believe it is a start. If it is something accepted, I can keep working on this and you can take charge of everything else if that is okay. Here's the changes I made:
Dependencies are youtube-dl
and youtube-dl-warp
index.js
(Changes at lines 34 and 776)
const request = require('request');
const { version } = require('./package.json');
// NodeJS Dependencies
const fs = require('fs');
const prompts = require('prompts');
const chalk = require('chalk');
const axios = require('axios');
let config = require('./user_config_DEFAULT.json');
// Variables used for logging
let userLogs = '';
const logFormat = 'txt';
let date = new Date();
let date_string = `${date.getFullYear()} ${
date.getMonth() + 1
} ${date.getDate()} at ${date.getHours()}-${date.getMinutes()}-${date.getSeconds()}`;
let startTime = null;
let lastAPICallForSubreddit = false;
let currentAPICall = null;
let currentSubredditIndex = 0; // Used to track which subreddit the user is downloading from
let responseSize = -1; // Used to track the size of the response from the API call, aka how many posts are in the response
// User-defined variables, these can be preset with the help of testingMode
let timeBetweenRuns = 0; // in milliseconds, the time between runs. This is only used if repeatForever is true
let subredditList = []; // List of subreddits in this format: ['subreddit1', 'subreddit2', 'subreddit3']
let numberOfPosts = -1; // How many posts to go through, more posts = more downloads, but takes longer
let sorting = 'top'; // How to sort the posts (top, new, hot, rising, controversial)
let time = 'all'; // What time period to sort by (hour, day, week, month, year, all)
let repeatForever = false; // If true, the program will repeat every timeBetweenRuns milliseconds
let downloadDirectory = ''; // Where to download the files to, defined when
let useYTDLforVideo = '' // Experiment on using YoutubeDL for videos, quick fix for issue #65
let currentUserAfter = ''; // Used to track the after value for the API call, this is used to get the next X posts
// Default object to track the downloaded posts by type,
// and the subreddit downloading from.
let downloadedPosts = {
subreddit: '',
self: 0,
media: 0,
link: 0,
failed: 0,
skipped_due_to_duplicate: 0,
skipped_due_to_fileType: 0,
};
// Read the user_config.json file for user configuration options
if (fs.existsSync('./user_config.json')) {
config = require('./user_config.json');
checkConfig();
} else {
// create ./user_config.json if it doesn't exist, by duplicating user_config_DEFAULT.json and renaming it
fs.copyFile('./user_config_DEFAULT.json', './user_config.json', (err) => {
if (err) throw err;
log('user_config.json was created. Edit it to manage user options.', true);
config = require('./user_config.json');
});
checkConfig();
}
// check if download_post_list.txt exists, if it doesn't, create it
if (!fs.existsSync('./download_post_list.txt')) {
fs.writeFile('./download_post_list.txt', '', (err) => {
if (err) throw err;
let fileDefaultContent = `# Below, please list any posts that you wish to download. # \n# They must follow this format below: # \n# https://www.reddit.com/r/gadgets/comments/ptt967/eu_proposes_mandatory_usbc_on_all_devices/ # \n# Lines with "#" at the start will be ignored (treated as comments). #`;
// write a few lines to the file
fs.appendFile('./download_post_list.txt', fileDefaultContent, (err) => {
if (err) throw err;
log('download_post_list.txt was created with default content.', true);
});
});
}
// Testing Mode for developer testing. This enables you to hardcode
// the variables above and skip the prompt.
// To edit, go into the user_config.json file.
const testingMode = config.testingMode;
if (testingMode) {
subredditList = config.testingModeOptions.subredditList;
numberOfPosts = config.testingModeOptions.numberOfPosts;
sorting = config.testingModeOptions.sorting;
time = config.testingModeOptions.time;
repeatForever = config.testingModeOptions.repeatForever;
timeBetweenRuns = config.testingModeOptions.timeBetweenRuns;
}
// Start actions
console.clear(); // Clear the console
log(
chalk.cyan(
'π Welcome to the easiest & most customizable Reddit Post Downloader!'
),
false
);
log(
chalk.yellow(
'π Contribute @ https://github.com/josephrcox/easy-reddit-downloader'
),
false
);
log(
chalk.blue(
'π€ Confused? Check out the README @ https://github.com/josephrcox/easy-reddit-downloader#readme\n'
),
false
);
// For debugging logs
log('User config: ' + JSON.stringify(config), true);
if (config.testingMode) {
log('Testing mode options: ' + JSON.stringify(config.testingMode), true);
}
function checkConfig() {
let warnTheUser = false;
let quitApplicaton = false;
let count =
(config.file_naming_scheme.showDate === true) +
(config.file_naming_scheme.showAuthor === true) +
(config.file_naming_scheme.showTitle === true);
if (count === 0) {
quitApplicaton = true;
} else if (count < 2) {
warnTheUser = true;
}
if (warnTheUser) {
log(
chalk.red(
'WARNING: Your file naming scheme (user_config.json) is poorly set, we recommend changing it.'
),
false
);
}
if (quitApplicaton) {
log(
chalk.red(
'ALERT: Your file naming scheme (user_config.json) does not have any options set. You can not download posts without filenames. Aborting. '
),
false
);
process.exit(1);
}
if (quitApplicaton || warnTheUser) {
log(
chalk.red(
'Read about recommended naming schemes here - https://github.com/josephrcox/easy-reddit-downloader/blob/main/README.md#File-naming-scheme'
),
false
);
}
}
// Make a GET request to the GitHub API to get the latest release
request.get(
'https://api.github.com/repos/josephrcox/easy-reddit-downloader/releases/latest',
{ headers: { 'User-Agent': 'Downloader' } },
(error, response, body) => {
if (error) {
log(error, true);
} else {
// Parse the reβsponse body to get the version number of the latest release
const latestRelease = JSON.parse(body);
const latestVersion = latestRelease.tag_name;
// Compare the current version to the latest release version
if (version !== latestVersion) {
log(
`Hey! A new version (${latestVersion}) is available. \nConsider updating to the latest version with 'git pull'.\n`,
false
);
startScript();
} else {
log('You are on the latest stable version (' + version + ')\n', true);
startScript();
}
}
}
);
function startScript() {
startTime = new Date();
if (!testingMode && !config.download_post_list_options.enabled) {
startPrompt();
} else {
if (config.download_post_list_options.enabled) {
downloadFromPostListFile();
} else {
downloadSubredditPosts(subredditList[0], ''); // skip the prompt and get right to the API calls
}
}
}
async function startPrompt() {
const questions = [
{
type: 'text',
name: 'subreddit',
message:
'Which subreddits or users would you like to download? You may submit multiple separated by commas (no spaces).',
validate: (value) =>
value.length < 1 ? `Please enter at least one subreddit or user` : true,
},
{
type: 'number',
name: 'numberOfPosts',
message:
'How many posts would you like to attempt to download? If you would like to download all posts, enter 0.',
validate: (value) =>
// check if value is a number
!isNaN(value) ? true : `Please enter a number`,
},
{
type: 'text',
name: 'sorting',
message:
'How would you like to sort? (top, new, hot, rising, controversial)',
validate: (value) =>
value.toLowerCase() === 'top' ||
value.toLowerCase() === 'new' ||
value.toLowerCase() === 'hot' ||
value.toLowerCase() === 'rising' ||
value.toLowerCase() === 'controversial'
? true
: `Please enter a valid sorting method`,
},
{
type: 'text',
name: 'time',
message: 'During what time period? (hour, day, week, month, year, all)',
validate: (value) =>
value.toLowerCase() === 'hour' ||
value.toLowerCase() === 'day' ||
value.toLowerCase() === 'week' ||
value.toLowerCase() === 'month' ||
value.toLowerCase() === 'year' ||
value.toLowerCase() === 'all'
? true
: `Please enter a valid time period`,
},
{
type: 'toggle',
name: 'repeatForever',
message: 'Would you like to run this on repeat?',
initial: false,
active: 'yes',
inactive: 'no',
},
{
type: (prev) => (prev == true ? 'number' : null),
name: 'timeBetweenRuns',
message: 'How often would you like to run this? (in ms)',
},
];
const result = await prompts(questions);
subredditList = result.subreddit.split(','); // the user enters subreddits separated by commas
repeatForever = result.repeatForever;
numberOfPosts = result.numberOfPosts;
sorting = result.sorting.replace(/\s/g, '');
time = result.time.replace(/\s/g, '');
// clean up the subreddit list in case the user puts in invalid chars
for (let i = 0; i < subredditList.length; i++) {
subredditList[i] = subredditList[i].replace(/\s/g, '');
}
if (numberOfPosts === 0) {
numberOfPosts = 9999999999999999999999;
}
if (repeatForever) {
if (result.repeat < 0) {
result.repeat = 0;
}
timeBetweenRuns = result.timeBetweenRuns; // the user enters the time between runs in ms
}
// With the data gathered, call the APIs and download the posts
downloadSubredditPosts(subredditList[0], '');
}
function makeDirectories() {
// Make needed directories for downloads,
// clean and nsfw are made nomatter the subreddits downloaded
if (!fs.existsSync('./downloads')) {
fs.mkdirSync('./downloads');
}
if (config.separate_clean_nsfw) {
if (!fs.existsSync('./downloads/clean')) {
fs.mkdirSync('./downloads/clean');
}
if (!fs.existsSync('./downloads/nsfw')) {
fs.mkdirSync('./downloads/nsfw');
}
}
}
async function downloadSubredditPosts(subreddit, lastPostId) {
let isUser = false;
if (
subreddit.includes('u/') ||
subreddit.includes('user/') ||
subreddit.includes('/u/')
) {
isUser = true;
subreddit = subreddit.split('u/').pop();
return downloadUser(subreddit, lastPostId);
}
let postsRemaining = numberOfPostsRemaining()[0];
if (postsRemaining <= 0) {
// If we have downloaded enough posts, move on to the next subreddit
if (subredditList.length > 1) {
return downloadNextSubreddit();
} else {
// If we have downloaded all the subreddits, end the program
return checkIfDone('', true);
}
return;
} else if (postsRemaining > 100) {
// If we have more posts to download than the limit of 100, set it to 100
postsRemaining = 100;
}
// if lastPostId is undefined, set it to an empty string. Common on first run.
if (lastPostId == undefined) {
lastPostId = '';
}
makeDirectories();
try {
if (subreddit == undefined) {
if (subredditList.length > 1) {
return downloadNextSubreddit();
} else {
return checkIfDone();
}
}
// Use log function to log a string
// as well as a boolean if the log should be displayed to the user.
if (isUser) {
log(
`\n\nπ Requesting posts from
https://www.reddit.com/user/${subreddit.replace(
'u/',
''
)}/${sorting}/.json?sort=${sorting}&t=${time}&limit=${postsRemaining}&after=${lastPostId}\n`,
true
);
} else {
log(
`\n\nπ Requesting posts from
https://www.reddit.com/r/${subreddit}/${sorting}/.json?sort=${sorting}&t=${time}&limit=${postsRemaining}&after=${lastPostId}\n`,
true
);
}
// Get the top posts from the subreddit
let response = null;
let data = null;
try {
response = await axios.get(
`https://www.reddit.com/r/${subreddit}/${sorting}/.json?sort=${sorting}&t=${time}&limit=${postsRemaining}&after=${lastPostId}`
);
data = await response.data;
currentAPICall = data;
if (data.message == 'Not Found' || data.data.children.length == 0) {
throw error;
}
if (data.data.children.length < postsRemaining) {
lastAPICallForSubreddit = true;
postsRemaining = data.data.children.length;
} else {
lastAPICallForSubreddit = false;
}
} catch (err) {
log(
`\n\nERROR: There was a problem fetching posts for ${subreddit}. This is likely because the subreddit is private, banned, or doesn't exist.`,
true
);
if (subredditList.length > 1) {
if (currentSubredditIndex > subredditList.length - 1) {
currentSubredditIndex = -1;
}
currentSubredditIndex += 1;
return downloadSubredditPosts(subredditList[currentSubredditIndex], '');
} else {
return checkIfDone('', true);
}
}
// if the first post on the subreddit is NSFW, then there is a fair chance
// that the rest of the posts are NSFW.
let isOver18 = data.data.children[0].data.over_18 ? 'nsfw' : 'clean';
downloadedPosts.subreddit = data.data.children[0].data.subreddit;
if (!config.separate_clean_nsfw) {
downloadDirectory = `./downloads/${data.data.children[0].data.subreddit}`;
} else {
downloadDirectory = `./downloads/${isOver18}/${data.data.children[0].data.subreddit}`;
}
// Make sure the image directory exists
// If no directory is found, create one
if (!fs.existsSync(downloadDirectory)) {
fs.mkdirSync(downloadDirectory);
}
responseSize = data.data.children.length;
await data.data.children.forEach(async (child, i) => {
try {
const post = child.data;
downloadPost(post);
} catch (e) {
log(e, true);
}
});
} catch (error) {
// throw the error
throw error;
}
}
async function downloadUser(user, currentUserAfter) {
let lastPostId = currentUserAfter;
let postsRemaining = numberOfPostsRemaining()[0];
if (postsRemaining <= 0) {
// If we have downloaded enough posts, move on to the next subreddit
if (subredditList.length > 1) {
return downloadNextSubreddit();
} else {
// If we have downloaded all the subreddits, end the program
return checkIfDone('', true);
}
return;
} else if (postsRemaining > 100) {
// If we have more posts to download than the limit of 100, set it to 100
postsRemaining = 100;
}
// if lastPostId is undefined, set it to an empty string. Common on first run.
if (lastPostId == undefined) {
lastPostId = '';
}
makeDirectories();
try {
if (user == undefined) {
if (subredditList.length > 1) {
return downloadNextSubreddit();
} else {
return checkIfDone();
}
}
// Use log function to log a string
// as well as a boolean if the log should be displayed to the user.
let reqUrl = `https://www.reddit.com/user/${user.replace(
'u/',
''
)}/submitted/.json?limit=${postsRemaining}&after=${lastPostId}`;
log(
`\n\nπ Requesting posts from
${reqUrl}\n`,
false
);
// Get the top posts from the subreddit
let response = null;
let data = null;
try {
response = await axios.get(`${reqUrl}`);
data = await response.data;
currentUserAfter = data.data.after;
currentAPICall = data;
if (data.message == 'Not Found' || data.data.children.length == 0) {
throw error;
}
if (data.data.children.length < postsRemaining) {
lastAPICallForSubreddit = true;
postsRemaining = data.data.children.length;
} else {
lastAPICallForSubreddit = false;
}
} catch (err) {
log(
`\n\nERROR: There was a problem fetching posts for ${user}. This is likely because the subreddit is private, banned, or doesn't exist.`,
true
);
if (subredditList.length > 1) {
if (currentSubredditIndex > subredditList.length - 1) {
currentSubredditIndex = -1;
}
currentSubredditIndex += 1;
return downloadSubredditPosts(subredditList[currentSubredditIndex], '');
} else {
return checkIfDone('', true);
}
}
downloadDirectory = `./downloads/user_${user.replace('u/', '')}`;
// Make sure the image directory exists
// If no directory is found, create one
if (!fs.existsSync(downloadDirectory)) {
fs.mkdirSync(downloadDirectory);
}
responseSize = data.data.children.length;
await data.data.children.forEach(async (child, i) => {
try {
const post = child.data;
downloadPost(post);
} catch (e) {
log(e, true);
}
});
} catch (error) {
// throw the error
throw error;
}
}
async function downloadFromPostListFile() {
// this is called when config.download_from_post_list_file is true
// this will read the download_post_list.txt file and download all the posts in it
// downloading skips any lines starting with "#" as they are used for documentation
// read the file
let file = fs.readFileSync('./download_post_list.txt', 'utf8');
// split the file into an array of lines
let lines = file.split('\n');
// remove any lines that start with "#"
lines = lines.filter((line) => !line.startsWith('#'));
// remove any empty lines
lines = lines.filter((line) => line != '');
// remove any lines that are just whitespace
lines = lines.filter((line) => line.trim() != '');
// remove any lines that don't start with "https://www.reddit.com"
lines = lines.filter((line) => line.startsWith('https://www.reddit.com'));
// remove any lines that don't have "/comments/" in them
lines = lines.filter((line) => line.includes('/comments/'));
numberOfPosts = lines.length;
repeatForever = config.download_post_list_options.repeatForever;
timeBetweenRuns = config.download_post_list_options.timeBetweenRuns;
// iterate over the lines and download the posts
for (let i = 0; i < lines.length; i++) {
const line = lines[i];
const reqUrl = line + '.json';
axios.get(reqUrl).then(async (response) => {
const post = response.data[0].data.children[0].data;
let isOver18 = post.over_18 ? 'nsfw' : 'clean';
downloadedPosts.subreddit = post.subreddit;
makeDirectories();
if (!config.separate_clean_nsfw) {
downloadDirectory = `./downloads/${post.subreddit}`;
} else {
downloadDirectory = `./downloads/${isOver18}/${post.subreddit}`;
}
// Make sure the image directory exists
// If no directory is found, create one
if (!fs.existsSync(downloadDirectory)) {
fs.mkdirSync(downloadDirectory);
}
downloadPost(post);
});
}
}
function getPostType(post, postTypeOptions) {
log(`Analyzing post with title: ${post.title}) and URL: ${post.url}`, true);
if (post.post_hint === 'self' || post.is_self) {
postType = 0;
} else if (
post.post_hint === 'image' ||
(post.post_hint === 'rich:video' && !post.domain.includes('youtu')) ||
post.post_hint === 'hosted:video' ||
(post.post_hint === 'link' &&
post.domain.includes('imgur') &&
!post.url_overridden_by_dest.includes('gallery')) ||
post.domain.includes('i.redd.it')
) {
postType = 1;
} else if (post.poll_data != undefined) {
postType = 3; // UNSUPPORTED
} else {
postType = 2;
}
log(
`Post has type: ${postTypeOptions[postType]} due to their post hint: ${post.post_hint} and domain: ${post.domain}`,
true
);
return postType;
}
async function downloadMediaFile(downloadURL, filePath, postName) {
try {
const response = await axios({
method: 'GET',
url: downloadURL,
responseType: 'stream',
});
response.data.pipe(fs.createWriteStream(filePath));
return new Promise((resolve, reject) => {
response.data.on('end', () => {
downloadedPosts.media += 1;
checkIfDone(postName);
resolve();
});
response.data.on('error', (error) => {
reject(error);
});
});
} catch (error) {
downloadedPosts.failed += 1;
checkIfDone(postName);
if (error.code === 'ENOTFOUND') {
log(
'ERROR: Hostname not found for: ' + downloadURL + '\n... skipping post',
true
);
} else {
log('ERROR: ' + error, true);
}
}
}
async function downloadPost(post) {
let postTypeOptions = ['self', 'media', 'link', 'poll'];
let postType = -1; // default to no postType until one is found
// Determine the type of post. If no type is found, default to link as a last resort.
// If it accidentally downloads a self or media post as a link, it will still
// save properly.
postType = getPostType(post, postTypeOptions);
// All posts should have URLs, so just make sure that it does.
// If the post doesn't have a URL, then it should be skipped.
if (postType != 3 && post.url !== undefined) {
// Array of possible (supported) image and video formats
const imageFormats = ['jpeg', 'jpg', 'gif', 'png', 'mp4', 'webm', 'gifv'];
let downloadURL = post.url;
// Get the file type of the post via the URL. If it ends in .jpg, then it's a jpg.
let fileType = downloadURL.split('.').pop();
// Post titles can be really long and have invalid characters, so we need to clean them up.
let postTitleScrubbed = sanitizeFileName(post.title);
postTitleScrubbed = getFileName(post);
if (postType === 0) {
let toDownload = await shouldWeDownload(
post.subreddit,
`${postTitleScrubbed}.txt`
);
if (!toDownload) {
downloadedPosts.skipped_due_to_duplicate += 1;
return checkIfDone(post.name);
} else {
if (!config.download_self_posts) {
log(`Skipping self post with title: ${post.title}`, true);
downloadedPosts.skipped_due_to_fileType += 1;
return checkIfDone(post.name);
} else {
// DOWNLOAD A SELF POST
let comments_string = '';
let postResponse = null;
let data = null;
try {
postResponse = await axios.get(`${post.url}.json`);
data = postResponse.data;
} catch (error) {
log(`Axios failure with ${post.url}`, true);
return checkIfDone(post.name);
}
// With text/self posts, we want to download the top comments as well.
// This is done by requesting the post's JSON data, and then iterating through each comment.
// We also iterate through the top nested comments (only one level deep).
// So we have a file output with the post title, the post text, the author, and the top comments.
comments_string += post.title + ' by ' + post.author + '\n\n';
comments_string += post.selftext + '\n';
comments_string +=
'------------------------------------------------\n\n';
if (config.download_comments) {
// If the user wants to download comments
comments_string += '--COMMENTS--\n\n';
data[1].data.children.forEach((child) => {
const comment = child.data;
comments_string += comment.author + ':\n';
comments_string += comment.body + '\n';
if (comment.replies) {
const top_reply = comment.replies.data.children[0].data;
comments_string += '\t>\t' + top_reply.author + ':\n';
comments_string += '\t>\t' + top_reply.body + '\n';
}
comments_string += '\n\n\n';
});
}
fs.writeFile(
`${downloadDirectory}/${postTitleScrubbed}.txt`,
comments_string,
function (err) {
if (err) {
log(err, true);
}
downloadedPosts.self += 1;
if (checkIfDone(post.name)) {
return;
}
}
);
}
}
} else if (postType === 1) {
// DOWNLOAD A MEDIA POST
if (post.preview != undefined) {
// Reddit stores fallback URL previews for some GIFs.
// Changing the URL to download to the fallback URL will download the GIF, in MP4 format.
if (post.preview.reddit_video_preview != undefined) {
log(
"Using fallback URL for Reddit's GIF preview." +
post.preview.reddit_video_preview,
true
);
downloadURL = post.preview.reddit_video_preview.fallback_url;
fileType = 'mp4';
} else if (post.url_overridden_by_dest.includes('.gifv')) {
// Luckily, you can just swap URLs on imgur with .gifv
// with ".mp4" to get the MP4 version. Amazing!
log('Replacing gifv with mp4', true);
downloadURL = post.url_overridden_by_dest.replace('.gifv', '.mp4');
fileType = 'mp4';
} else {
let sourceURL = post.preview.images[0].source.url;
// set fileType to whatever imageFormat item is in the sourceURL
for (let i = 0; i < imageFormats.length; i++) {
if (
sourceURL.toLowerCase().includes(imageFormats[i].toLowerCase())
) {
fileType = imageFormats[i];
break;
}
}
}
}
if (post.media != undefined && post.post_hint == 'hosted:video') {
// If the post has a media object, then it's a video.
// We need to get the URL from the media object.
// This is because the URL in the post object is a fallback URL.
// The media object has the actual URL.
if (useYTDLforVideo != true) {
const YoutubeDlWrap = require("youtube-dl-wrap");
const youtubeDlWrap = new YoutubeDlWrap("/home/noexplorer/Downloads/youtube-dl");
let youtubeDlEventEmitter = youtubeDlWrap.exec([post.url])
// .on("progress", (progress) =>
// console.log(progress.percent, progress.totalSize, progress.currentSpeed, progress.eta))
// .on("youtubeDlEvent", (eventType, eventData) => console.log(eventType, eventData))
// .on("error", (error) => console.error(error))
.on("close", () => console.log("Done downloading video post with link", post.url ))
console.log(youtubeDlEventEmitter.youtubeDlProcess.pid);
}
else {
downloadURL = post.media.reddit_video.fallback_url;
fileType = 'mp4';
}
} else if (
post.media != undefined &&
post.post_hint == 'rich:video' &&
post.media.oembed.thumbnail_url != undefined
) {
// Common for gfycat links
downloadURL = post.media.oembed.thumbnail_url;
fileType = 'gif';
}
if (!config.download_media_posts) {
log(`Skipping media post with title: ${post.title}`, true);
downloadedPosts.skipped_due_to_fileType += 1;
return checkIfDone(post.name);
} else {
let toDownload = await shouldWeDownload(
post.subreddit,
`${postTitleScrubbed}.${fileType}`
);
if (!toDownload) {
downloadedPosts.skipped_due_to_duplicate += 1;
if (checkIfDone(post.name)) {
return;
}
} else {
downloadMediaFile(
downloadURL,
`${downloadDirectory}/${postTitleScrubbed}.${fileType}`,
post.name
);
}
}
} else if (postType === 2) {
if (!config.download_link_posts) {
log(`Skipping link post with title: ${post.title}`, true);
downloadedPosts.skipped_due_to_fileType += 1;
return checkIfDone(post.name);
} else {
let toDownload = await shouldWeDownload(
post.subreddit,
`${postTitleScrubbed}.html`
);
if (!toDownload) {
downloadedPosts.skipped_due_to_duplicate += 1;
if (checkIfDone(post.name)) {
return;
}
} else {
// DOWNLOAD A LINK POST
// With link posts, we create a simple HTML file that redirects to the post's URL.
// This enables the user to still "open" the link file, and it will redirect to the post.
// No comments or other data is stored.
let htmlFile = `<html><body><script type='text/javascript'>window.location.href = "${post.url}";</script></body></html>`;
fs.writeFile(
`${downloadDirectory}/${postTitleScrubbed}.html`,
htmlFile,
function (err) {
if (err) throw err;
downloadedPosts.link += 1;
if (checkIfDone(post.name)) {
return;
}
}
);
}
}
} else {
log('Failed to download: ' + post.title + 'with URL: ' + post.url, true);
downloadedPosts.failed += 1;
if (checkIfDone(post.name)) {
return;
}
}
} else {
log('Failed to download: ' + post.title + 'with URL: ' + post.url, true);
downloadedPosts.failed += 1;
if (checkIfDone(post.name)) {
return;
}
}
}
function downloadNextSubreddit() {
if (currentSubredditIndex > subredditList.length) {
checkIfDone('', true);
} else {
currentSubredditIndex += 1;
downloadSubredditPosts(subredditList[currentSubredditIndex]);
}
}
function shouldWeDownload(subreddit, postTitleWithPrefixAndExtension) {
if (
config.redownload_posts === true ||
config.redownload_posts === undefined
) {
if (config.redownload_posts === undefined) {
log(
chalk.red(
"ALERT: Please note that the 'redownload_posts' option is now available in user_config. See the default JSON for example usage."
),
true
);
}
return true;
} else {
// Check if the post in the subreddit folder already exists.
// If it does, we don't need to download it again.
let postExists = fs.existsSync(
`${downloadDirectory}/${postTitleWithPrefixAndExtension}`
);
return !postExists;
}
}
function onErr(err) {
log(err, true);
return 1;
}
// checkIfDone is called frequently to see if we have downloaded the number of posts
// that the user requested to download.
// We could check this inline but it's easier to read if it's a separate function,
// and this ensures that we only check after the files are done being downloaded to the PC, not
// just when the request is sent.
function checkIfDone(lastPostId, override) {
// Add up all downloaded/failed posts that have been downloaded so far, and check if it matches the
// number requested.
if (
(lastAPICallForSubreddit &&
lastPostId ===
currentAPICall.data.children[responseSize - 1].data.name) ||
numberOfPostsRemaining()[0] === 0 ||
override ||
(numberOfPostsRemaining()[1] === responseSize && responseSize < 100)
) {
let endTime = new Date();
let timeDiff = endTime - startTime;
timeDiff /= 1000;
let msPerPost = (timeDiff / numberOfPostsRemaining()[1])
.toString()
.substring(0, 5);
if (numberOfPosts >= 99999999999999999999) {
log(
`Still downloading posts from ${chalk.cyan(
subredditList[currentSubredditIndex]
)}... (${numberOfPostsRemaining()[1]}/all)`,
false
);
} else if (config.download_post_list_options.enabled) {
log(
`Still downloading posts from ${chalk.cyan(
'download_post_list.txt'
)}... (${numberOfPostsRemaining()[1]}/${numberOfPosts})`,
false
);
} else {
log(
`Still downloading posts from ${chalk.cyan(
subredditList[currentSubredditIndex]
)}... (${numberOfPostsRemaining()[1]}/${numberOfPosts})`,
false
);
}
log('Validating that all posts were downloaded...', false);
setTimeout(() => {
if (config.download_post_list_options.enabled) {
log(
'π All done downloading posts from download_post_list.txt!',
false
);
} else {
log(
'π All done downloading posts from ' +
subredditList[currentSubredditIndex] +
'!',
false
);
}
log(JSON.stringify(downloadedPosts), true);
if (currentSubredditIndex === subredditList.length - 1) {
log(
`\nπ Downloading took ${timeDiff} seconds, at about ${msPerPost} seconds/post`,
false
);
}
// default values for next run (important if being run multiple times)
downloadedPosts = {
subreddit: '',
self: 0,
media: 0,
link: 0,
failed: 0,
skipped_due_to_duplicate: 0,
skipped_due_to_fileType: 0,
};
if (currentSubredditIndex < subredditList.length - 1) {
downloadNextSubreddit();
} else if (repeatForever) {
currentSubredditIndex = 0;
log(
`β²οΈ Waiting ${timeBetweenRuns / 1000} seconds before rerunning...`,
false
);
setTimeout(function () {
if (config.download_post_list_options.enabled) {
downloadFromPostListFile();
} else {
downloadSubredditPosts(subredditList[0], '');
}
startTime = new Date();
}, timeBetweenRuns);
} else {
startPrompt();
}
return true;
}, 1000);
} else {
if (numberOfPosts >= 99999999999999999999) {
log(
`Still downloading posts from ${chalk.cyan(
subredditList[currentSubredditIndex]
)}... (${numberOfPostsRemaining()[1]}/all)`,
false
);
} else if (config.download_post_list_options.enabled) {
log(
`Still downloading posts from ${chalk.cyan(
'download_post_list.txt'
)}... (${numberOfPostsRemaining()[1]}/${numberOfPosts})`,
false
);
} else {
log(
`Still downloading posts from ${chalk.cyan(
subredditList[currentSubredditIndex]
)}... (${numberOfPostsRemaining()[1]}/${numberOfPosts})`,
false
);
}
for (let i = 0; i < Object.keys(downloadedPosts).length; i++) {
log(
`\t- ${Object.keys(downloadedPosts)[i]}: ${
Object.values(downloadedPosts)[i]
}`,
true
);
}
log('\n', true);
if (numberOfPostsRemaining()[1] % 100 == 0) {
return downloadSubredditPosts(
subredditList[currentSubredditIndex],
lastPostId
);
}
return false;
}
}
function getFileName(post) {
let fileName = '';
if (
config.file_naming_scheme.showDate ||
config.file_naming_scheme.showDate === undefined
) {
let timestamp = post.created;
var date = new Date(timestamp * 1000);
var year = date.getFullYear();
var month = (date.getMonth() + 1).toString().padStart(2, '0');
var day = date.getDate().toString().padStart(2, '0');
fileName += `${year}-${month}-${day}`;
}
if (
config.file_naming_scheme.showAuthor ||
config.file_naming_scheme.showAuthor === undefined
) {
fileName += `_${post.author}`;
}
if (
config.file_naming_scheme.showTitle ||
config.file_naming_scheme.showTitle === undefined
) {
let title = sanitizeFileName(post.title);
fileName += `_${title}`;
}
// The max length for most systems is about 255. To give some wiggle room, I'm doing 240
if (fileName.length > 240) {
fileName = fileName.substring(0, 240);
}
return fileName;
}
function numberOfPostsRemaining() {
let total =
downloadedPosts.self +
downloadedPosts.media +
downloadedPosts.link +
downloadedPosts.failed +
downloadedPosts.skipped_due_to_duplicate +
downloadedPosts.skipped_due_to_fileType;
return [numberOfPosts - total, total];
}
function log(message, detailed) {
// This function takes a message string and a boolean.
// If the boolean is true, the message will be logged to the console, otherwise it
// will only be logged to the log file.
userLogs += message + '\r\n\n';
let visibleToUser = true;
if (detailed) {
visibleToUser = config.detailed_logs;
}
if (visibleToUser) {
console.log(message);
}
if (config.local_logs && subredditList.length > 0) {
if (!fs.existsSync('./logs')) {
fs.mkdirSync('./logs');
}
let logFileName = '';
if (config.local_logs_naming_scheme.showDateAndTime) {
logFileName += `${date_string} - `;
}
if (config.local_logs_naming_scheme.showSubreddits) {
let subredditListString = JSON.stringify(subredditList).replace(
/[^a-zA-Z0-9,]/g,
''
);
logFileName += `${subredditListString} - `;
}
if (config.local_logs_naming_scheme.showNumberOfPosts) {
if (numberOfPosts < 999999999999999999) {
logFileName += `ALL - `;
} else {
logFileName += `${numberOfPosts} - `;
}
}
if (logFileName.endsWith(' - ')) {
logFileName = logFileName.substring(0, logFileName.length - 3);
}
fs.writeFile(
`./logs/${logFileName}.${logFormat}`,
userLogs,
function (err) {
if (err) throw err;
}
);
}
}
// sanitize function for file names so that they work on Mac, Windows, and Linux
function sanitizeFileName(fileName) {
return fileName
.replace(/[/\\?%*:|"<>]/g, '-')
.replace(/([^/])\/([^/])/g, '$1_$2');
}
user_config.json
(Changes at the "Testing Mode" part)
{
"testingMode": false,
"testingModeOptions": {
"numberOfPosts": 25,
"sorting": "new",
"time": "month",
"repeatForever": true,
"timeBetweenRuns": 30000,
"useYTDLforVideo": true
},
"download_post_list_options": {
"enabled": false,
"repeatForever": false,
"timeBetweenRuns": 3000
},
"local_logs": true,
"local_logs_naming_scheme": {
"showDateAndTime": true,
"showSubreddits": true,
"showNumberOfPosts": true
},
"file_naming_scheme": {
"showDateAndTime": true,
"showAuthor": true,
"showTitle": true
},
"download_self_posts": true,
"download_media_posts": true,
"download_link_posts": true,
"download_comments": true,
"separate_clean_nsfw": false,
"redownload_posts": false,
"detailed_logs": false
}
Hey @NoExplorer Thanks!
Sorry for the delay here. I am going to open up a PR and see if I can test it out.
Trying to get the above to work locally and I'm having issues. Feel free to submit a PR with the above code changes, and I can try to run your exact PR
Don't worry about the delay! I'll see if I copied something wrong and I'll submit a PR sometime today.
Opened a PR https://github.com/josephrcox/easy-reddit-downloader/pull/67 . I will await further actions.
Big chance for it to not work is the path to the youtube-dl-wrap executable. I am trying to think of a way to get it from the user for now (like an input and passing it somehow to line 780 const youtubeDlWrap = new YoutubeDlWrap(path-to-executable);
) @josephrcox. Ideally, we can automatically fetch it without the user's input but I am unaware on if that is possible and how that can be achieved. (Using youtube-dl-exec
solved the path to the binary issue. Since it just does this const youtubedl = require('youtube-dl-exec')
. So hopefully, adding youtube-dl-exec
to the package.json
file just makes this thing so much easier to solve and won't need a path for a binary. youtube-dl-exec
can be found here (NPM) and here (GitHub) )
Might be overcomplicating things here. Make sure to tell if this is a non-ideal way to fixing the video lacking audio issue.
Edit: ~[something funny is going on with the user_config generation script on my end, investigating..]~ Code editor might be doing something weird (VSCodium). I'll do the changes in Notepad++
Edit 2: ~The binary/executable that's needed to get passed at that line is youtube-dl, which if I recall is stored with all the other modules at ./node_modules/youtube-dl
(if I figure out what is going on with the dependencies on package.json
it should be stored at the application's folder). It shouldn't be difficult to add a line that points there in line 780. I'll tinker with that idea sometime today and report back (and update the PR).~ (again youtube-dl-exec
doesn't need a path. But this is on my machine, test it on your end as well if possible.) Also figured out what was going on with my index.js
, code editor was doing some weird activity, and switching to Notepad++ everything was solved.
Edit 3: ~Tried my idea, doesn't seem to work for some absurd reason. Not a genius in Javascript, but this is what I tried const youtubeDlWrap = new YoutubeDlWrap("node_modules\youtube-dl\bin");
. I don't quite understand how file paths work in NodeJS so I'm open to corrections~ Opted to use youtube-dl-exec
which doesn't need a wrapper. Uses MIT License so everything should be fine.. Attached is a quick demo (2023-05-09 17.15 EEST)
63 See comments here for details on bug