josephrcox / easy-reddit-downloader

Simple headless Reddit post downloader
MIT License
80 stars 12 forks source link

MP4 videos are downloading without audio #65

Closed josephrcox closed 1 year ago

josephrcox commented 1 year ago

63 See comments here for details on bug

NoExplorer commented 1 year ago

Back with some input, I ran the program with the detailed_logs option turned on, couldn't get any info. However, reddit seems to have a possible solution for this. I do not know if this is what you are currently working on. Possible solution depends on FFMpeg which will be a hard thing to implement in node I believe (Comment has solution for python, the basic idea is there. Download the audio and video parts separately and later join them with FFMpeg [if that is possible in node. There's an NPM module available, but I have no experience with node.] assuming that Reddit hasn't changed the way videos are handled.). https://old.reddit.com/r/redditdev/comments/9a16fv/videos_downloading_without_sound/ed8us5e/

NoExplorer commented 1 year ago

Sorry to revive a silent thread after a while, but @josephrcox how is the Video+Audio download operation handled? I'm looking into messing around with JavaScript. I did see that the program is attempting to use the Fallback URL method for the video part but nothing for the audio. Some people say that YT-DLP has support for Reddit posts and it did work on a simple test I did. There's a wrapper for it in NPM but I assume that this project would be aiming for all self-written code, but how does this sound as a quick fix? Perhaps there can be a switch in the user-config.js file that toggles the use of YT-DLP for this and a self-written approach to video downloading.

(Source for the YT-DL(P) thing I found https://old.reddit.com/r/redditdev/comments/6mr7oi/how_to_download_a_video_hosted_on_reddit/dnu47xw/ )

NoExplorer commented 1 year ago

So uhh, I tried to work on this idea a bit by myself. It works slightly, some very useful stuff this program has do not work (files are not stored where they should be [just the videos], progress reporting is a little broken) due to my spaghetti-tier code, but I believe it is a start. If it is something accepted, I can keep working on this and you can take charge of everything else if that is okay. Here's the changes I made:

Dependencies are youtube-dl and youtube-dl-warp

index.js (Changes at lines 34 and 776)

const request = require('request');
const { version } = require('./package.json');

// NodeJS Dependencies
const fs = require('fs');
const prompts = require('prompts');
const chalk = require('chalk');
const axios = require('axios');

let config = require('./user_config_DEFAULT.json');

// Variables used for logging
let userLogs = '';
const logFormat = 'txt';
let date = new Date();
let date_string = `${date.getFullYear()} ${
    date.getMonth() + 1
} ${date.getDate()} at ${date.getHours()}-${date.getMinutes()}-${date.getSeconds()}`;
let startTime = null;
let lastAPICallForSubreddit = false;
let currentAPICall = null;

let currentSubredditIndex = 0; // Used to track which subreddit the user is downloading from
let responseSize = -1; // Used to track the size of the response from the API call, aka how many posts are in the response

// User-defined variables, these can be preset with the help of testingMode
let timeBetweenRuns = 0; // in milliseconds, the time between runs. This is only used if repeatForever is true
let subredditList = []; // List of subreddits in this format: ['subreddit1', 'subreddit2', 'subreddit3']
let numberOfPosts = -1; // How many posts to go through, more posts = more downloads, but takes longer
let sorting = 'top'; // How to sort the posts (top, new, hot, rising, controversial)
let time = 'all'; // What time period to sort by (hour, day, week, month, year, all)
let repeatForever = false; // If true, the program will repeat every timeBetweenRuns milliseconds
let downloadDirectory = ''; // Where to download the files to, defined when
let useYTDLforVideo = '' // Experiment on using YoutubeDL for videos, quick fix for issue #65

let currentUserAfter = ''; // Used to track the after value for the API call, this is used to get the next X posts

// Default object to track the downloaded posts by type,
// and the subreddit downloading from.
let downloadedPosts = {
    subreddit: '',
    self: 0,
    media: 0,
    link: 0,
    failed: 0,
    skipped_due_to_duplicate: 0,
    skipped_due_to_fileType: 0,
};

// Read the user_config.json file for user configuration options
if (fs.existsSync('./user_config.json')) {
    config = require('./user_config.json');
    checkConfig();
} else {
    // create ./user_config.json if it doesn't exist, by duplicating user_config_DEFAULT.json and renaming it
    fs.copyFile('./user_config_DEFAULT.json', './user_config.json', (err) => {
        if (err) throw err;
        log('user_config.json was created. Edit it to manage user options.', true);
        config = require('./user_config.json');
    });
    checkConfig();
}

// check if download_post_list.txt exists, if it doesn't, create it
if (!fs.existsSync('./download_post_list.txt')) {
    fs.writeFile('./download_post_list.txt', '', (err) => {
        if (err) throw err;

        let fileDefaultContent = `# Below, please list any posts that you wish to download. # \n# They must follow this format below: # \n# https://www.reddit.com/r/gadgets/comments/ptt967/eu_proposes_mandatory_usbc_on_all_devices/ # \n# Lines with "#" at the start will be ignored (treated as comments). #`;

        // write a few lines to the file
        fs.appendFile('./download_post_list.txt', fileDefaultContent, (err) => {
            if (err) throw err;
            log('download_post_list.txt was created with default content.', true);
        });
    });
}

// Testing Mode for developer testing. This enables you to hardcode
// the variables above and skip the prompt.
// To edit, go into the user_config.json file.
const testingMode = config.testingMode;
if (testingMode) {
    subredditList = config.testingModeOptions.subredditList;
    numberOfPosts = config.testingModeOptions.numberOfPosts;
    sorting = config.testingModeOptions.sorting;
    time = config.testingModeOptions.time;
    repeatForever = config.testingModeOptions.repeatForever;
    timeBetweenRuns = config.testingModeOptions.timeBetweenRuns; 
}

// Start actions
console.clear(); // Clear the console
log(
    chalk.cyan(
        'πŸ‘‹ Welcome to the easiest & most customizable Reddit Post Downloader!'
    ),
    false
);
log(
    chalk.yellow(
        '😎 Contribute @ https://github.com/josephrcox/easy-reddit-downloader'
    ),
    false
);
log(
    chalk.blue(
        'πŸ€” Confused? Check out the README @ https://github.com/josephrcox/easy-reddit-downloader#readme\n'
    ),
    false
);
// For debugging logs
log('User config: ' + JSON.stringify(config), true);
if (config.testingMode) {
    log('Testing mode options: ' + JSON.stringify(config.testingMode), true);
}

function checkConfig() {
    let warnTheUser = false;
    let quitApplicaton = false;

    let count =
        (config.file_naming_scheme.showDate === true) +
        (config.file_naming_scheme.showAuthor === true) +
        (config.file_naming_scheme.showTitle === true);
    if (count === 0) {
        quitApplicaton = true;
    } else if (count < 2) {
        warnTheUser = true;
    }

    if (warnTheUser) {
        log(
            chalk.red(
                'WARNING: Your file naming scheme (user_config.json) is poorly set, we recommend changing it.'
            ),
            false
        );
    }

    if (quitApplicaton) {
        log(
            chalk.red(
                'ALERT: Your file naming scheme (user_config.json) does not have any options set. You can not download posts without filenames. Aborting. '
            ),
            false
        );
        process.exit(1);
    }

    if (quitApplicaton || warnTheUser) {
        log(
            chalk.red(
                'Read about recommended naming schemes here - https://github.com/josephrcox/easy-reddit-downloader/blob/main/README.md#File-naming-scheme'
            ),
            false
        );
    }
}

// Make a GET request to the GitHub API to get the latest release
request.get(
    'https://api.github.com/repos/josephrcox/easy-reddit-downloader/releases/latest',
    { headers: { 'User-Agent': 'Downloader' } },
    (error, response, body) => {
        if (error) {
            log(error, true);
        } else {
            // Parse the re∏sponse body to get the version number of the latest release
            const latestRelease = JSON.parse(body);
            const latestVersion = latestRelease.tag_name;

            // Compare the current version to the latest release version
            if (version !== latestVersion) {
                log(
                    `Hey! A new version (${latestVersion}) is available. \nConsider updating to the latest version with 'git pull'.\n`,
                    false
                );
                startScript();
            } else {
                log('You are on the latest stable version (' + version + ')\n', true);
                startScript();
            }
        }
    }
);

function startScript() {
    startTime = new Date();
    if (!testingMode && !config.download_post_list_options.enabled) {
        startPrompt();
    } else {
        if (config.download_post_list_options.enabled) {
            downloadFromPostListFile();
        } else {
            downloadSubredditPosts(subredditList[0], ''); // skip the prompt and get right to the API calls
        }
    }
}

async function startPrompt() {
    const questions = [
        {
            type: 'text',
            name: 'subreddit',
            message:
                'Which subreddits or users would you like to download? You may submit multiple separated by commas (no spaces).',
            validate: (value) =>
                value.length < 1 ? `Please enter at least one subreddit or user` : true,
        },
        {
            type: 'number',
            name: 'numberOfPosts',
            message:
                'How many posts would you like to attempt to download? If you would like to download all posts, enter 0.',
            validate: (value) =>
                // check if value is a number
                !isNaN(value) ? true : `Please enter a number`,
        },
        {
            type: 'text',
            name: 'sorting',
            message:
                'How would you like to sort? (top, new, hot, rising, controversial)',
            validate: (value) =>
                value.toLowerCase() === 'top' ||
                value.toLowerCase() === 'new' ||
                value.toLowerCase() === 'hot' ||
                value.toLowerCase() === 'rising' ||
                value.toLowerCase() === 'controversial'
                    ? true
                    : `Please enter a valid sorting method`,
        },
        {
            type: 'text',
            name: 'time',
            message: 'During what time period? (hour, day, week, month, year, all)',
            validate: (value) =>
                value.toLowerCase() === 'hour' ||
                value.toLowerCase() === 'day' ||
                value.toLowerCase() === 'week' ||
                value.toLowerCase() === 'month' ||
                value.toLowerCase() === 'year' ||
                value.toLowerCase() === 'all'
                    ? true
                    : `Please enter a valid time period`,
        },
        {
            type: 'toggle',
            name: 'repeatForever',
            message: 'Would you like to run this on repeat?',
            initial: false,
            active: 'yes',
            inactive: 'no',
        },
        {
            type: (prev) => (prev == true ? 'number' : null),
            name: 'timeBetweenRuns',
            message: 'How often would you like to run this? (in ms)',
        },
    ];

    const result = await prompts(questions);
    subredditList = result.subreddit.split(','); // the user enters subreddits separated by commas
    repeatForever = result.repeatForever;
    numberOfPosts = result.numberOfPosts;
    sorting = result.sorting.replace(/\s/g, '');
    time = result.time.replace(/\s/g, '');

    // clean up the subreddit list in case the user puts in invalid chars
    for (let i = 0; i < subredditList.length; i++) {
        subredditList[i] = subredditList[i].replace(/\s/g, '');
    }

    if (numberOfPosts === 0) {
        numberOfPosts = 9999999999999999999999;
    }

    if (repeatForever) {
        if (result.repeat < 0) {
            result.repeat = 0;
        }
        timeBetweenRuns = result.timeBetweenRuns; // the user enters the time between runs in ms
    }

    // With the data gathered, call the APIs and download the posts
    downloadSubredditPosts(subredditList[0], '');
}

function makeDirectories() {
    // Make needed directories for downloads,
    // clean and nsfw are made nomatter the subreddits downloaded
    if (!fs.existsSync('./downloads')) {
        fs.mkdirSync('./downloads');
    }
    if (config.separate_clean_nsfw) {
        if (!fs.existsSync('./downloads/clean')) {
            fs.mkdirSync('./downloads/clean');
        }
        if (!fs.existsSync('./downloads/nsfw')) {
            fs.mkdirSync('./downloads/nsfw');
        }
    }
}

async function downloadSubredditPosts(subreddit, lastPostId) {
    let isUser = false;
    if (
        subreddit.includes('u/') ||
        subreddit.includes('user/') ||
        subreddit.includes('/u/')
    ) {
        isUser = true;
        subreddit = subreddit.split('u/').pop();
        return downloadUser(subreddit, lastPostId);
    }
    let postsRemaining = numberOfPostsRemaining()[0];
    if (postsRemaining <= 0) {
        // If we have downloaded enough posts, move on to the next subreddit
        if (subredditList.length > 1) {
            return downloadNextSubreddit();
        } else {
            // If we have downloaded all the subreddits, end the program
            return checkIfDone('', true);
        }
        return;
    } else if (postsRemaining > 100) {
        // If we have more posts to download than the limit of 100, set it to 100
        postsRemaining = 100;
    }

    // if lastPostId is undefined, set it to an empty string. Common on first run.
    if (lastPostId == undefined) {
        lastPostId = '';
    }
    makeDirectories();

    try {
        if (subreddit == undefined) {
            if (subredditList.length > 1) {
                return downloadNextSubreddit();
            } else {
                return checkIfDone();
            }
        }

        // Use log function to log a string
        // as well as a boolean if the log should be displayed to the user.
        if (isUser) {
            log(
                `\n\nπŸ‘€ Requesting posts from
                https://www.reddit.com/user/${subreddit.replace(
                    'u/',
                    ''
                )}/${sorting}/.json?sort=${sorting}&t=${time}&limit=${postsRemaining}&after=${lastPostId}\n`,
                true
            );
        } else {
            log(
                `\n\nπŸ‘€ Requesting posts from
            https://www.reddit.com/r/${subreddit}/${sorting}/.json?sort=${sorting}&t=${time}&limit=${postsRemaining}&after=${lastPostId}\n`,
                true
            );
        }

        // Get the top posts from the subreddit
        let response = null;
        let data = null;

        try {
            response = await axios.get(
                `https://www.reddit.com/r/${subreddit}/${sorting}/.json?sort=${sorting}&t=${time}&limit=${postsRemaining}&after=${lastPostId}`
            );

            data = await response.data;

            currentAPICall = data;
            if (data.message == 'Not Found' || data.data.children.length == 0) {
                throw error;
            }
            if (data.data.children.length < postsRemaining) {
                lastAPICallForSubreddit = true;
                postsRemaining = data.data.children.length;
            } else {
                lastAPICallForSubreddit = false;
            }
        } catch (err) {
            log(
                `\n\nERROR: There was a problem fetching posts for ${subreddit}. This is likely because the subreddit is private, banned, or doesn't exist.`,
                true
            );
            if (subredditList.length > 1) {
                if (currentSubredditIndex > subredditList.length - 1) {
                    currentSubredditIndex = -1;
                }
                currentSubredditIndex += 1;
                return downloadSubredditPosts(subredditList[currentSubredditIndex], '');
            } else {
                return checkIfDone('', true);
            }
        }

        // if the first post on the subreddit is NSFW, then there is a fair chance
        // that the rest of the posts are NSFW.
        let isOver18 = data.data.children[0].data.over_18 ? 'nsfw' : 'clean';
        downloadedPosts.subreddit = data.data.children[0].data.subreddit;

        if (!config.separate_clean_nsfw) {
            downloadDirectory = `./downloads/${data.data.children[0].data.subreddit}`;
        } else {
            downloadDirectory = `./downloads/${isOver18}/${data.data.children[0].data.subreddit}`;
        }

        // Make sure the image directory exists
        // If no directory is found, create one
        if (!fs.existsSync(downloadDirectory)) {
            fs.mkdirSync(downloadDirectory);
        }

        responseSize = data.data.children.length;

        await data.data.children.forEach(async (child, i) => {
            try {
                const post = child.data;
                downloadPost(post);
            } catch (e) {
                log(e, true);
            }
        });
    } catch (error) {
        // throw the error
        throw error;
    }
}

async function downloadUser(user, currentUserAfter) {
    let lastPostId = currentUserAfter;
    let postsRemaining = numberOfPostsRemaining()[0];
    if (postsRemaining <= 0) {
        // If we have downloaded enough posts, move on to the next subreddit
        if (subredditList.length > 1) {
            return downloadNextSubreddit();
        } else {
            // If we have downloaded all the subreddits, end the program
            return checkIfDone('', true);
        }
        return;
    } else if (postsRemaining > 100) {
        // If we have more posts to download than the limit of 100, set it to 100
        postsRemaining = 100;
    }

    // if lastPostId is undefined, set it to an empty string. Common on first run.
    if (lastPostId == undefined) {
        lastPostId = '';
    }
    makeDirectories();

    try {
        if (user == undefined) {
            if (subredditList.length > 1) {
                return downloadNextSubreddit();
            } else {
                return checkIfDone();
            }
        }

        // Use log function to log a string
        // as well as a boolean if the log should be displayed to the user.
        let reqUrl = `https://www.reddit.com/user/${user.replace(
            'u/',
            ''
        )}/submitted/.json?limit=${postsRemaining}&after=${lastPostId}`;
        log(
            `\n\nπŸ‘€ Requesting posts from
            ${reqUrl}\n`,
            false
        );

        // Get the top posts from the subreddit
        let response = null;
        let data = null;

        try {
            response = await axios.get(`${reqUrl}`);

            data = await response.data;
            currentUserAfter = data.data.after;

            currentAPICall = data;
            if (data.message == 'Not Found' || data.data.children.length == 0) {
                throw error;
            }
            if (data.data.children.length < postsRemaining) {
                lastAPICallForSubreddit = true;
                postsRemaining = data.data.children.length;
            } else {
                lastAPICallForSubreddit = false;
            }
        } catch (err) {
            log(
                `\n\nERROR: There was a problem fetching posts for ${user}. This is likely because the subreddit is private, banned, or doesn't exist.`,
                true
            );
            if (subredditList.length > 1) {
                if (currentSubredditIndex > subredditList.length - 1) {
                    currentSubredditIndex = -1;
                }
                currentSubredditIndex += 1;
                return downloadSubredditPosts(subredditList[currentSubredditIndex], '');
            } else {
                return checkIfDone('', true);
            }
        }

        downloadDirectory = `./downloads/user_${user.replace('u/', '')}`;

        // Make sure the image directory exists
        // If no directory is found, create one
        if (!fs.existsSync(downloadDirectory)) {
            fs.mkdirSync(downloadDirectory);
        }

        responseSize = data.data.children.length;

        await data.data.children.forEach(async (child, i) => {
            try {
                const post = child.data;
                downloadPost(post);
            } catch (e) {
                log(e, true);
            }
        });
    } catch (error) {
        // throw the error
        throw error;
    }
}

async function downloadFromPostListFile() {
    // this is called when config.download_from_post_list_file is true
    // this will read the download_post_list.txt file and download all the posts in it
    // downloading skips any lines starting with "#" as they are used for documentation

    // read the file
    let file = fs.readFileSync('./download_post_list.txt', 'utf8');
    // split the file into an array of lines
    let lines = file.split('\n');
    // remove any lines that start with "#"
    lines = lines.filter((line) => !line.startsWith('#'));
    // remove any empty lines
    lines = lines.filter((line) => line != '');
    // remove any lines that are just whitespace
    lines = lines.filter((line) => line.trim() != '');
    // remove any lines that don't start with "https://www.reddit.com"
    lines = lines.filter((line) => line.startsWith('https://www.reddit.com'));
    // remove any lines that don't have "/comments/" in them
    lines = lines.filter((line) => line.includes('/comments/'));
    numberOfPosts = lines.length;

    repeatForever = config.download_post_list_options.repeatForever;
    timeBetweenRuns = config.download_post_list_options.timeBetweenRuns;

    // iterate over the lines and download the posts
    for (let i = 0; i < lines.length; i++) {
        const line = lines[i];
        const reqUrl = line + '.json';
        axios.get(reqUrl).then(async (response) => {
            const post = response.data[0].data.children[0].data;
            let isOver18 = post.over_18 ? 'nsfw' : 'clean';
            downloadedPosts.subreddit = post.subreddit;
            makeDirectories();

            if (!config.separate_clean_nsfw) {
                downloadDirectory = `./downloads/${post.subreddit}`;
            } else {
                downloadDirectory = `./downloads/${isOver18}/${post.subreddit}`;
            }

            // Make sure the image directory exists
            // If no directory is found, create one
            if (!fs.existsSync(downloadDirectory)) {
                fs.mkdirSync(downloadDirectory);
            }
            downloadPost(post);
        });
    }
}

function getPostType(post, postTypeOptions) {
    log(`Analyzing post with title: ${post.title}) and URL: ${post.url}`, true);
    if (post.post_hint === 'self' || post.is_self) {
        postType = 0;
    } else if (
        post.post_hint === 'image' ||
        (post.post_hint === 'rich:video' && !post.domain.includes('youtu')) ||
        post.post_hint === 'hosted:video' ||
        (post.post_hint === 'link' &&
            post.domain.includes('imgur') &&
            !post.url_overridden_by_dest.includes('gallery')) ||
        post.domain.includes('i.redd.it')
    ) {
        postType = 1;
    } else if (post.poll_data != undefined) {
        postType = 3; // UNSUPPORTED
    } else {
        postType = 2;
    }
    log(
        `Post has type: ${postTypeOptions[postType]} due to their post hint: ${post.post_hint} and domain: ${post.domain}`,
        true
    );
    return postType;
}

async function downloadMediaFile(downloadURL, filePath, postName) {
    try {
        const response = await axios({
            method: 'GET',
            url: downloadURL,
            responseType: 'stream',
        });

        response.data.pipe(fs.createWriteStream(filePath));

        return new Promise((resolve, reject) => {
            response.data.on('end', () => {
                downloadedPosts.media += 1;
                checkIfDone(postName);
                resolve();
            });

            response.data.on('error', (error) => {
                reject(error);
            });
        });
    } catch (error) {
        downloadedPosts.failed += 1;
        checkIfDone(postName);
        if (error.code === 'ENOTFOUND') {
            log(
                'ERROR: Hostname not found for: ' + downloadURL + '\n... skipping post',
                true
            );
        } else {
            log('ERROR: ' + error, true);
        }
    }
}

async function downloadPost(post) {
    let postTypeOptions = ['self', 'media', 'link', 'poll'];
    let postType = -1; // default to no postType until one is found

    // Determine the type of post. If no type is found, default to link as a last resort.
    // If it accidentally downloads a self or media post as a link, it will still
    // save properly.
    postType = getPostType(post, postTypeOptions);

    // All posts should have URLs, so just make sure that it does.
    // If the post doesn't have a URL, then it should be skipped.
    if (postType != 3 && post.url !== undefined) {
        // Array of possible (supported) image and video formats
        const imageFormats = ['jpeg', 'jpg', 'gif', 'png', 'mp4', 'webm', 'gifv'];

        let downloadURL = post.url;
        // Get the file type of the post via the URL. If it ends in .jpg, then it's a jpg.
        let fileType = downloadURL.split('.').pop();
        // Post titles can be really long and have invalid characters, so we need to clean them up.
        let postTitleScrubbed = sanitizeFileName(post.title);
        postTitleScrubbed = getFileName(post);

        if (postType === 0) {
            let toDownload = await shouldWeDownload(
                post.subreddit,
                `${postTitleScrubbed}.txt`
            );
            if (!toDownload) {
                downloadedPosts.skipped_due_to_duplicate += 1;
                return checkIfDone(post.name);
            } else {
                if (!config.download_self_posts) {
                    log(`Skipping self post with title: ${post.title}`, true);
                    downloadedPosts.skipped_due_to_fileType += 1;
                    return checkIfDone(post.name);
                } else {
                    // DOWNLOAD A SELF POST
                    let comments_string = '';
                    let postResponse = null;
                    let data = null;
                    try {
                        postResponse = await axios.get(`${post.url}.json`);
                        data = postResponse.data;
                    } catch (error) {
                        log(`Axios failure with ${post.url}`, true);
                        return checkIfDone(post.name);
                    }

                    // With text/self posts, we want to download the top comments as well.
                    // This is done by requesting the post's JSON data, and then iterating through each comment.
                    // We also iterate through the top nested comments (only one level deep).
                    // So we have a file output with the post title, the post text, the author, and the top comments.

                    comments_string += post.title + ' by ' + post.author + '\n\n';
                    comments_string += post.selftext + '\n';
                    comments_string +=
                        '------------------------------------------------\n\n';
                    if (config.download_comments) {
                        // If the user wants to download comments
                        comments_string += '--COMMENTS--\n\n';
                        data[1].data.children.forEach((child) => {
                            const comment = child.data;
                            comments_string += comment.author + ':\n';
                            comments_string += comment.body + '\n';
                            if (comment.replies) {
                                const top_reply = comment.replies.data.children[0].data;
                                comments_string += '\t>\t' + top_reply.author + ':\n';
                                comments_string += '\t>\t' + top_reply.body + '\n';
                            }
                            comments_string += '\n\n\n';
                        });
                    }

                    fs.writeFile(
                        `${downloadDirectory}/${postTitleScrubbed}.txt`,
                        comments_string,
                        function (err) {
                            if (err) {
                                log(err, true);
                            }
                            downloadedPosts.self += 1;
                            if (checkIfDone(post.name)) {
                                return;
                            }
                        }
                    );
                }
            }
        } else if (postType === 1) {
            // DOWNLOAD A MEDIA POST
            if (post.preview != undefined) {
                // Reddit stores fallback URL previews for some GIFs.
                // Changing the URL to download to the fallback URL will download the GIF, in MP4 format.
                if (post.preview.reddit_video_preview != undefined) {
                    log(
                        "Using fallback URL for Reddit's GIF preview." +
                            post.preview.reddit_video_preview,
                        true
                    );
                    downloadURL = post.preview.reddit_video_preview.fallback_url;
                    fileType = 'mp4';
                } else if (post.url_overridden_by_dest.includes('.gifv')) {
                    // Luckily, you can just swap URLs on imgur with .gifv
                    // with ".mp4" to get the MP4 version. Amazing!
                    log('Replacing gifv with mp4', true);
                    downloadURL = post.url_overridden_by_dest.replace('.gifv', '.mp4');
                    fileType = 'mp4';
                } else {
                    let sourceURL = post.preview.images[0].source.url;
                    // set fileType to whatever imageFormat item is in the sourceURL
                    for (let i = 0; i < imageFormats.length; i++) {
                        if (
                            sourceURL.toLowerCase().includes(imageFormats[i].toLowerCase())
                        ) {
                            fileType = imageFormats[i];
                            break;
                        }
                    }
                }
            }
            if (post.media != undefined && post.post_hint == 'hosted:video') {
                // If the post has a media object, then it's a video.
                // We need to get the URL from the media object.
                // This is because the URL in the post object is a fallback URL.
                // The media object has the actual URL.
                if (useYTDLforVideo != true) {
                    const YoutubeDlWrap = require("youtube-dl-wrap");
                    const youtubeDlWrap = new YoutubeDlWrap("/home/noexplorer/Downloads/youtube-dl");

                    let youtubeDlEventEmitter = youtubeDlWrap.exec([post.url])
                    // .on("progress", (progress) => 
                    // console.log(progress.percent, progress.totalSize, progress.currentSpeed, progress.eta))
                    // .on("youtubeDlEvent", (eventType, eventData) => console.log(eventType, eventData))
                    // .on("error", (error) => console.error(error))
                    .on("close", () => console.log("Done downloading video post with link", post.url ))
                    console.log(youtubeDlEventEmitter.youtubeDlProcess.pid);
                }
                else {
                    downloadURL = post.media.reddit_video.fallback_url;
                    fileType = 'mp4';                   
                }
            } else if (
                post.media != undefined &&
                post.post_hint == 'rich:video' &&
                post.media.oembed.thumbnail_url != undefined
            ) {
                // Common for gfycat links
                downloadURL = post.media.oembed.thumbnail_url;
                fileType = 'gif';
            }
            if (!config.download_media_posts) {
                log(`Skipping media post with title: ${post.title}`, true);
                downloadedPosts.skipped_due_to_fileType += 1;
                return checkIfDone(post.name);
            } else {
                let toDownload = await shouldWeDownload(
                    post.subreddit,
                    `${postTitleScrubbed}.${fileType}`
                );
                if (!toDownload) {
                    downloadedPosts.skipped_due_to_duplicate += 1;
                    if (checkIfDone(post.name)) {
                        return;
                    }
                } else {
                    downloadMediaFile(
                        downloadURL,
                        `${downloadDirectory}/${postTitleScrubbed}.${fileType}`,
                        post.name
                    );
                }
            }
        } else if (postType === 2) {
            if (!config.download_link_posts) {
                log(`Skipping link post with title: ${post.title}`, true);
                downloadedPosts.skipped_due_to_fileType += 1;
                return checkIfDone(post.name);
            } else {
                let toDownload = await shouldWeDownload(
                    post.subreddit,
                    `${postTitleScrubbed}.html`
                );
                if (!toDownload) {
                    downloadedPosts.skipped_due_to_duplicate += 1;
                    if (checkIfDone(post.name)) {
                        return;
                    }
                } else {
                    // DOWNLOAD A LINK POST
                    // With link posts, we create a simple HTML file that redirects to the post's URL.
                    // This enables the user to still "open" the link file, and it will redirect to the post.
                    // No comments or other data is stored.
                    let htmlFile = `<html><body><script type='text/javascript'>window.location.href = "${post.url}";</script></body></html>`;

                    fs.writeFile(
                        `${downloadDirectory}/${postTitleScrubbed}.html`,
                        htmlFile,
                        function (err) {
                            if (err) throw err;
                            downloadedPosts.link += 1;
                            if (checkIfDone(post.name)) {
                                return;
                            }
                        }
                    );
                }
            }
        } else {
            log('Failed to download: ' + post.title + 'with URL: ' + post.url, true);
            downloadedPosts.failed += 1;
            if (checkIfDone(post.name)) {
                return;
            }
        }
    } else {
        log('Failed to download: ' + post.title + 'with URL: ' + post.url, true);
        downloadedPosts.failed += 1;
        if (checkIfDone(post.name)) {
            return;
        }
    }
}

function downloadNextSubreddit() {
    if (currentSubredditIndex > subredditList.length) {
        checkIfDone('', true);
    } else {
        currentSubredditIndex += 1;
        downloadSubredditPosts(subredditList[currentSubredditIndex]);
    }
}

function shouldWeDownload(subreddit, postTitleWithPrefixAndExtension) {
    if (
        config.redownload_posts === true ||
        config.redownload_posts === undefined
    ) {
        if (config.redownload_posts === undefined) {
            log(
                chalk.red(
                    "ALERT: Please note that the 'redownload_posts' option is now available in user_config. See the default JSON for example usage."
                ),
                true
            );
        }
        return true;
    } else {
        // Check if the post in the subreddit folder already exists.
        // If it does, we don't need to download it again.
        let postExists = fs.existsSync(
            `${downloadDirectory}/${postTitleWithPrefixAndExtension}`
        );
        return !postExists;
    }
}

function onErr(err) {
    log(err, true);
    return 1;
}

// checkIfDone is called frequently to see if we have downloaded the number of posts
// that the user requested to download.
// We could check this inline but it's easier to read if it's a separate function,
// and this ensures that we only check after the files are done being downloaded to the PC, not
// just when the request is sent.
function checkIfDone(lastPostId, override) {
    // Add up all downloaded/failed posts that have been downloaded so far, and check if it matches the
    // number requested.
    if (
        (lastAPICallForSubreddit &&
            lastPostId ===
                currentAPICall.data.children[responseSize - 1].data.name) ||
        numberOfPostsRemaining()[0] === 0 ||
        override ||
        (numberOfPostsRemaining()[1] === responseSize && responseSize < 100)
    ) {
        let endTime = new Date();
        let timeDiff = endTime - startTime;
        timeDiff /= 1000;
        let msPerPost = (timeDiff / numberOfPostsRemaining()[1])
            .toString()
            .substring(0, 5);
        if (numberOfPosts >= 99999999999999999999) {
            log(
                `Still downloading posts from ${chalk.cyan(
                    subredditList[currentSubredditIndex]
                )}... (${numberOfPostsRemaining()[1]}/all)`,
                false
            );
        } else if (config.download_post_list_options.enabled) {
            log(
                `Still downloading posts from ${chalk.cyan(
                    'download_post_list.txt'
                )}... (${numberOfPostsRemaining()[1]}/${numberOfPosts})`,
                false
            );
        } else {
            log(
                `Still downloading posts from ${chalk.cyan(
                    subredditList[currentSubredditIndex]
                )}... (${numberOfPostsRemaining()[1]}/${numberOfPosts})`,
                false
            );
        }
        log('Validating that all posts were downloaded...', false);
        setTimeout(() => {
            if (config.download_post_list_options.enabled) {
                log(
                    'πŸŽ‰ All done downloading posts from download_post_list.txt!',
                    false
                );
            } else {
                log(
                    'πŸŽ‰ All done downloading posts from ' +
                        subredditList[currentSubredditIndex] +
                        '!',
                    false
                );
            }

            log(JSON.stringify(downloadedPosts), true);
            if (currentSubredditIndex === subredditList.length - 1) {
                log(
                    `\nπŸ“ˆ Downloading took ${timeDiff} seconds, at about ${msPerPost} seconds/post`,
                    false
                );
            }

            // default values for next run (important if being run multiple times)
            downloadedPosts = {
                subreddit: '',
                self: 0,
                media: 0,
                link: 0,
                failed: 0,
                skipped_due_to_duplicate: 0,
                skipped_due_to_fileType: 0,
            };

            if (currentSubredditIndex < subredditList.length - 1) {
                downloadNextSubreddit();
            } else if (repeatForever) {
                currentSubredditIndex = 0;
                log(
                    `⏲️ Waiting ${timeBetweenRuns / 1000} seconds before rerunning...`,
                    false
                );
                setTimeout(function () {
                    if (config.download_post_list_options.enabled) {
                        downloadFromPostListFile();
                    } else {
                        downloadSubredditPosts(subredditList[0], '');
                    }
                    startTime = new Date();
                }, timeBetweenRuns);
            } else {
                startPrompt();
            }
            return true;
        }, 1000);
    } else {
        if (numberOfPosts >= 99999999999999999999) {
            log(
                `Still downloading posts from ${chalk.cyan(
                    subredditList[currentSubredditIndex]
                )}... (${numberOfPostsRemaining()[1]}/all)`,
                false
            );
        } else if (config.download_post_list_options.enabled) {
            log(
                `Still downloading posts from ${chalk.cyan(
                    'download_post_list.txt'
                )}... (${numberOfPostsRemaining()[1]}/${numberOfPosts})`,
                false
            );
        } else {
            log(
                `Still downloading posts from ${chalk.cyan(
                    subredditList[currentSubredditIndex]
                )}... (${numberOfPostsRemaining()[1]}/${numberOfPosts})`,
                false
            );
        }

        for (let i = 0; i < Object.keys(downloadedPosts).length; i++) {
            log(
                `\t- ${Object.keys(downloadedPosts)[i]}: ${
                    Object.values(downloadedPosts)[i]
                }`,
                true
            );
        }
        log('\n', true);

        if (numberOfPostsRemaining()[1] % 100 == 0) {
            return downloadSubredditPosts(
                subredditList[currentSubredditIndex],
                lastPostId
            );
        }
        return false;
    }
}

function getFileName(post) {
    let fileName = '';
    if (
        config.file_naming_scheme.showDate ||
        config.file_naming_scheme.showDate === undefined
    ) {
        let timestamp = post.created;
        var date = new Date(timestamp * 1000);
        var year = date.getFullYear();
        var month = (date.getMonth() + 1).toString().padStart(2, '0');
        var day = date.getDate().toString().padStart(2, '0');
        fileName += `${year}-${month}-${day}`;
    }
    if (
        config.file_naming_scheme.showAuthor ||
        config.file_naming_scheme.showAuthor === undefined
    ) {
        fileName += `_${post.author}`;
    }
    if (
        config.file_naming_scheme.showTitle ||
        config.file_naming_scheme.showTitle === undefined
    ) {
        let title = sanitizeFileName(post.title);
        fileName += `_${title}`;
    }
    // The max length for most systems is about 255. To give some wiggle room, I'm doing 240
    if (fileName.length > 240) {
        fileName = fileName.substring(0, 240);
    }

    return fileName;
}

function numberOfPostsRemaining() {
    let total =
        downloadedPosts.self +
        downloadedPosts.media +
        downloadedPosts.link +
        downloadedPosts.failed +
        downloadedPosts.skipped_due_to_duplicate +
        downloadedPosts.skipped_due_to_fileType;
    return [numberOfPosts - total, total];
}

function log(message, detailed) {
    // This function takes a message string and a boolean.
    // If the boolean is true, the message will be logged to the console, otherwise it
    // will only be logged to the log file.
    userLogs += message + '\r\n\n';
    let visibleToUser = true;
    if (detailed) {
        visibleToUser = config.detailed_logs;
    }

    if (visibleToUser) {
        console.log(message);
    }
    if (config.local_logs && subredditList.length > 0) {
        if (!fs.existsSync('./logs')) {
            fs.mkdirSync('./logs');
        }

        let logFileName = '';
        if (config.local_logs_naming_scheme.showDateAndTime) {
            logFileName += `${date_string} - `;
        }
        if (config.local_logs_naming_scheme.showSubreddits) {
            let subredditListString = JSON.stringify(subredditList).replace(
                /[^a-zA-Z0-9,]/g,
                ''
            );
            logFileName += `${subredditListString} - `;
        }
        if (config.local_logs_naming_scheme.showNumberOfPosts) {
            if (numberOfPosts < 999999999999999999) {
                logFileName += `ALL - `;
            } else {
                logFileName += `${numberOfPosts} - `;
            }
        }

        if (logFileName.endsWith(' - ')) {
            logFileName = logFileName.substring(0, logFileName.length - 3);
        }

        fs.writeFile(
            `./logs/${logFileName}.${logFormat}`,
            userLogs,
            function (err) {
                if (err) throw err;
            }
        );
    }
}

// sanitize function for file names so that they work on Mac, Windows, and Linux
function sanitizeFileName(fileName) {
    return fileName
        .replace(/[/\\?%*:|"<>]/g, '-')
        .replace(/([^/])\/([^/])/g, '$1_$2');
}

user_config.json(Changes at the "Testing Mode" part)

{
    "testingMode": false,
    "testingModeOptions": {
        "numberOfPosts": 25,
        "sorting": "new",
        "time": "month",
        "repeatForever": true,
        "timeBetweenRuns": 30000,
        "useYTDLforVideo": true
    },
    "download_post_list_options": {
        "enabled": false,
        "repeatForever": false,
        "timeBetweenRuns": 3000
    },
    "local_logs": true,
    "local_logs_naming_scheme": {
        "showDateAndTime": true,
        "showSubreddits": true,
        "showNumberOfPosts": true
    },
    "file_naming_scheme": {
        "showDateAndTime": true,
        "showAuthor": true,
        "showTitle": true
    },
    "download_self_posts": true,
    "download_media_posts": true,
    "download_link_posts": true,
    "download_comments": true,
    "separate_clean_nsfw": false,
    "redownload_posts": false,
    "detailed_logs": false
}
josephrcox commented 1 year ago

Hey @NoExplorer Thanks!

Sorry for the delay here. I am going to open up a PR and see if I can test it out.

josephrcox commented 1 year ago

Trying to get the above to work locally and I'm having issues. Feel free to submit a PR with the above code changes, and I can try to run your exact PR

NoExplorer commented 1 year ago

Don't worry about the delay! I'll see if I copied something wrong and I'll submit a PR sometime today.

NoExplorer commented 1 year ago

Opened a PR https://github.com/josephrcox/easy-reddit-downloader/pull/67 . I will await further actions.

NoExplorer commented 1 year ago

Big chance for it to not work is the path to the youtube-dl-wrap executable. I am trying to think of a way to get it from the user for now (like an input and passing it somehow to line 780 const youtubeDlWrap = new YoutubeDlWrap(path-to-executable);) @josephrcox. Ideally, we can automatically fetch it without the user's input but I am unaware on if that is possible and how that can be achieved. (Using youtube-dl-exec solved the path to the binary issue. Since it just does this const youtubedl = require('youtube-dl-exec') . So hopefully, adding youtube-dl-exec to the package.json file just makes this thing so much easier to solve and won't need a path for a binary. youtube-dl-exec can be found here (NPM) and here (GitHub) )

Might be overcomplicating things here. Make sure to tell if this is a non-ideal way to fixing the video lacking audio issue.

Edit: ~[something funny is going on with the user_config generation script on my end, investigating..]~ Code editor might be doing something weird (VSCodium). I'll do the changes in Notepad++

Edit 2: ~The binary/executable that's needed to get passed at that line is youtube-dl, which if I recall is stored with all the other modules at ./node_modules/youtube-dl (if I figure out what is going on with the dependencies on package.json it should be stored at the application's folder). It shouldn't be difficult to add a line that points there in line 780. I'll tinker with that idea sometime today and report back (and update the PR).~ (again youtube-dl-exec doesn't need a path. But this is on my machine, test it on your end as well if possible.) Also figured out what was going on with my index.js, code editor was doing some weird activity, and switching to Notepad++ everything was solved.

Edit 3: ~Tried my idea, doesn't seem to work for some absurd reason. Not a genius in Javascript, but this is what I tried const youtubeDlWrap = new YoutubeDlWrap("node_modules\youtube-dl\bin");. I don't quite understand how file paths work in NodeJS so I'm open to corrections~ Opted to use youtube-dl-exec which doesn't need a wrapper. Uses MIT License so everything should be fine.. Attached is a quick demo (2023-05-09 17.15 EEST)

https://github.com/josephrcox/easy-reddit-downloader/assets/37076999/7896d51d-515a-4042-8e29-3af5f083ad0c