Unifying Vulnerability Detection with a Single Automated Script for Multiple Security Checks

prayas7102 commented 1 month ago

You can consolidate the four files (DetectBruteForceAttack.ts, DetectInputValidation.ts, InsecureAuthentication.ts, and AnalyzeSecurityHeaders.ts) into a single script since they share common libraries and calling functions. The only variation between these files is the dataset, which can be loaded based on the specific vulnerability check being performed.

By creating a single script, you can automate the detection of brute force attacks, input validation, insecure authentication, and security header analysis. This script can selectively load the appropriate dataset according to the type of vulnerability being checked, making the process more efficient and reducing code duplication.

Key steps:

Unify common logic: Merge the shared libraries, calling functions, and processes.
Parameterize dataset selection: Add functionality to the script that chooses the dataset based on the type of vulnerability (brute force, input validation, insecure authentication, security headers).
Call specific checks: Based on the selected vulnerability type, load the respective dataset and apply the detection logic.

This approach simplifies the process and ensures scalability when adding new vulnerability checks in the future.

Make sure the end user/developer (who downloads the NPM package) is able to smoothly run the NPM package after these changes.

lloydlobo commented 1 month ago

Hey there, I put together a quick script to see if I understood the workings. I would like to know if it is something that could satisfy the 3 steps, mentioned in the issue, before I'd look into adding a PR. Since, I am not particularly familiar with NPM packaging, and haven't been upto date with JS/TS recently. Cheers :)

View code

```typescript import natural from "natural"; // prettier-ignore // Import specific datasets with defined interfaces import { dataset as bruteForceData, DatasetSample, } from "./dataset_brute_force_attack"; import { dataset as inputValidationData } from "./dataset_input_validation"; import { dataset as insecureAuthData } from "./dataset_insecure_authentication"; import { dataset as securityHeadersData } from "./dataset_security_headers"; // Environment checks for debugging and development modes const IS_DEBUG = // @ts-ignore: 2304 (typeof Deno !== "undefined" && Deno.env.get("DEBUG_MODE") === "true") || (typeof process !== "undefined" && process.env.DEBUG_MODE === "true"); // prettier-ignore const IS_DEVELOPMENT = // @ts-ignore: 2304 (typeof Deno !== "undefined" && Deno.env.get("NODE_ENV") === "development") || (typeof process !== "undefined" && process.env.NODE_ENV === "development"); /** $ DEBUG_MODE=true deno run --allow-env scratch.ts */ /** $ NODE_ENV=development deno run --allow-env scratch.ts */ function debugDetectionStats( kind: Vulnerability, tokenizedSnippet: readonly string[] | null, labels: readonly number[], prediction: string, ) { console.debug({ labels, prediction, result: parseInt(prediction, 10), tokenizedSnippet, vulnerability: vulnerabilityToString(kind), }); } class ThreadSafeLogger { private buffer: string[] = []; private readonly BUFFER_FLUSH_LIMIT = 10; // NOTE: Adjust buffer size limit... report(message: string): void { this.buffer.push(message); if (this.buffer.length >= this.BUFFER_FLUSH_LIMIT) { this.flush(); } } flush(): void { if (this.buffer.length > 0) { console.log(this.buffer.join("\n")); this.buffer = []; // clear the buffer } } } /** * Enumerates types of vulnerability. * Note: Defined as a read-only frozen object to prevent modifications. */ const Vulnerability = Object.freeze({ BruteForceAttack: 0, InputValidation: 1, InsecureAuthentication: 2, SecurityHeaders: 3, } as const); type Vulnerability = (typeof Vulnerability)[keyof typeof Vulnerability]; // prettier-ignore function vulnerabilityToString(kind: Vulnerability): string { switch (kind) { case Vulnerability.BruteForceAttack: return "Brute Force Attack"; case Vulnerability.InputValidation: return "Input Validation"; case Vulnerability.InsecureAuthentication: return "Insecure Authentication"; case Vulnerability.SecurityHeaders: return "Security Headers"; default: throw new Error("Exhausted all 'switch' cases."); } } // prettier-ignore function getVulnerabilityData(kind: Vulnerability): DatasetSample[] { switch (kind) { case Vulnerability.BruteForceAttack: return bruteForceData; case Vulnerability.InputValidation: return inputValidationData; case Vulnerability.InsecureAuthentication: return insecureAuthData; case Vulnerability.SecurityHeaders: return securityHeadersData; default: throw new Error("Exhausted all 'switch' cases."); } } // Removing redundant data function function removeRedundantData( dataset: readonly DatasetSample[], ): DatasetSample[] { const uniqueEntries: DatasetSample[] = []; const seenEntries: Set = new Set(); let count0: number = 0; let count1: number = 0; for (const entry of dataset) { const entryString = JSON.stringify(entry); if (!seenEntries.has(entryString)) { if (IS_DEBUG && IS_DEVELOPMENT) { if (entry.label === 0) count0++; else count1++; } uniqueEntries.push({ ...entry }); // copy ensures immutability seenEntries.add(entryString); } } if (IS_DEBUG && IS_DEVELOPMENT) { console.debug({ count0, count1, entriesLen: uniqueEntries.length }); } return uniqueEntries; } // prettier-ignore function detect(kind: Vulnerability, codeSnippet: string, logger: ThreadSafeLogger): boolean { let isDetected: boolean = false; const kindStr: string = vulnerabilityToString(kind); // Create a tokenizer. const tokenizer = new natural.WordTokenizer(); const tokenizedSnippet: readonly string[] | null = tokenizer.tokenize(codeSnippet); // Make prediction using the trained classifier. if (tokenizedSnippet !== null) { // Prepare the data for training. const data: readonly DatasetSample[] = getVulnerabilityData(kind); const cleanedDataset: readonly DatasetSample[] = removeRedundantData(data); const codeSamples: readonly string[] = cleanedDataset.map((sample) => sample.code); const labels: readonly number[] = cleanedDataset.map((sample) => sample.label); // Vectorize the code samples using the tokenizer. const tokenizerSamples: (readonly string[])[] = codeSamples .map((code) => tokenizer.tokenize(code)) .filter((tokens): tokens is string[] => tokens !== null); // Train a Naive Bayes classifier const classifier = new natural.BayesClassifier(); for (let i = 0; i < tokenizerSamples.length; i++) { classifier.addDocument([...tokenizerSamples[i]], labels[i].toString()); // copy ensures immutability } classifier.train(); const prediction: string = classifier.classify([...tokenizedSnippet]); const result: number = parseInt(prediction, 10); if (result === 1) { isDetected = true; logger.report("==> Code vulnerable to " + kindStr + " in this file!!! "); } if (IS_DEBUG) { debugDetectionStats(kind, tokenizedSnippet, labels, prediction); } } if (!isDetected) { logger.report("==> Code NOT vulnerable to " + kindStr); } return !isDetected; } function main(): number { const vulnerabilities: readonly Vulnerability[] = Object.values( Vulnerability, ).filter((value): value is Vulnerability => typeof value === "number"); const codeSnippetSample = "const loginLimiter = rateLimit({\n" + " store: new MongoStore({\n" + " uri: 'mongodb://localhost:27017/ratelimits',\n" + " expireTimeMs: 60 * 1000, // 1 minute\n" + " }),\n" + " max: 5,\n" + " message: 'Too many login attempts from this IP, please try again later.'\n" + " });"; let exitStatus: number = 0; for (const vulnerability of vulnerabilities) { const logger = new ThreadSafeLogger(); if (!detect(vulnerability, codeSnippetSample, logger)) { exitStatus = 1; } logger.flush(); } return exitStatus; } main(); ```

prayas7102 commented 1 month ago

Hey there, I put together a quick script to see if I understood the workings. I would like to know if it is something that could satisfy the 3 steps, mentioned in the issue, before I'd look into adding a PR. Since, I am not particularly familiar with NPM packaging, and haven't been upto date with JS/TS recently. Cheers :)

View code

import natural from "natural";

// prettier-ignore
// Import specific datasets with defined interfaces
import { dataset as bruteForceData, DatasetSample, } from "./dataset_brute_force_attack";
import { dataset as inputValidationData } from "./dataset_input_validation";
import { dataset as insecureAuthData } from "./dataset_insecure_authentication";
import { dataset as securityHeadersData } from "./dataset_security_headers";

// Environment checks for debugging and development modes
const IS_DEBUG = // @ts-ignore: 2304
    (typeof Deno !== "undefined" && Deno.env.get("DEBUG_MODE") === "true") ||
    (typeof process !== "undefined" && process.env.DEBUG_MODE === "true");

// prettier-ignore
const IS_DEVELOPMENT = // @ts-ignore: 2304
    (typeof Deno !== "undefined" && Deno.env.get("NODE_ENV") === "development") ||
    (typeof process !== "undefined" && process.env.NODE_ENV === "development");

/** $ DEBUG_MODE=true deno run --allow-env scratch.ts */
/** $ NODE_ENV=development deno run --allow-env scratch.ts */

function debugDetectionStats(
    kind: Vulnerability,
    tokenizedSnippet: readonly string[] | null,
    labels: readonly number[],
    prediction: string,
) {
    console.debug({
        labels,
        prediction,
        result: parseInt(prediction, 10),
        tokenizedSnippet,
        vulnerability: vulnerabilityToString(kind),
    });
}

class ThreadSafeLogger {
    private buffer: string[] = [];
    private readonly BUFFER_FLUSH_LIMIT = 10; // NOTE: Adjust buffer size limit...

    report(message: string): void {
        this.buffer.push(message);
        if (this.buffer.length >= this.BUFFER_FLUSH_LIMIT) {
            this.flush();
        }
    }

    flush(): void {
        if (this.buffer.length > 0) {
            console.log(this.buffer.join("\n"));
            this.buffer = []; // clear the buffer
        }
    }
}

/**
 * Enumerates types of vulnerability.
 * Note: Defined as a read-only frozen object to prevent modifications.
 */
const Vulnerability = Object.freeze({
    BruteForceAttack: 0,
    InputValidation: 1,
    InsecureAuthentication: 2,
    SecurityHeaders: 3,
} as const);

type Vulnerability = (typeof Vulnerability)[keyof typeof Vulnerability];

// prettier-ignore
function vulnerabilityToString(kind: Vulnerability): string {
    switch (kind) {
        case Vulnerability.BruteForceAttack: return "Brute Force Attack";
        case Vulnerability.InputValidation: return "Input Validation";
        case Vulnerability.InsecureAuthentication: return "Insecure Authentication";
        case Vulnerability.SecurityHeaders: return "Security Headers";
        default: throw new Error("Exhausted all 'switch' cases.");
    }
}

// prettier-ignore
function getVulnerabilityData(kind: Vulnerability): DatasetSample[] {
    switch (kind) {
        case Vulnerability.BruteForceAttack: return bruteForceData;
        case Vulnerability.InputValidation: return inputValidationData;
        case Vulnerability.InsecureAuthentication: return insecureAuthData;
        case Vulnerability.SecurityHeaders: return securityHeadersData;
        default: throw new Error("Exhausted all 'switch' cases.");
    }
}

// Removing redundant data function
function removeRedundantData(
    dataset: readonly DatasetSample[],
): DatasetSample[] {
    const uniqueEntries: DatasetSample[] = [];
    const seenEntries: Set<string> = new Set();

    let count0: number = 0;
    let count1: number = 0;
    for (const entry of dataset) {
        const entryString = JSON.stringify(entry);
        if (!seenEntries.has(entryString)) {
            if (IS_DEBUG && IS_DEVELOPMENT) {
                if (entry.label === 0) count0++;
                else count1++;
            }
            uniqueEntries.push({ ...entry }); // copy ensures immutability
            seenEntries.add(entryString);
        }
    }
    if (IS_DEBUG && IS_DEVELOPMENT) {
        console.debug({ count0, count1, entriesLen: uniqueEntries.length });
    }

    return uniqueEntries;
}

// prettier-ignore
function detect(kind: Vulnerability, codeSnippet: string, logger: ThreadSafeLogger): boolean {
    let isDetected: boolean = false;
    const kindStr: string = vulnerabilityToString(kind);

    // Create a tokenizer.
    const tokenizer = new natural.WordTokenizer();
    const tokenizedSnippet: readonly string[] | null = tokenizer.tokenize(codeSnippet);

    // Make prediction using the trained classifier.
    if (tokenizedSnippet !== null) {
        // Prepare the data for training.
        const data: readonly DatasetSample[] = getVulnerabilityData(kind);
        const cleanedDataset: readonly DatasetSample[] = removeRedundantData(data);
        const codeSamples: readonly string[] = cleanedDataset.map((sample) => sample.code);
        const labels: readonly number[] = cleanedDataset.map((sample) => sample.label);

        // Vectorize the code samples using the tokenizer.
        const tokenizerSamples: (readonly string[])[] = codeSamples
            .map((code) => tokenizer.tokenize(code))
            .filter((tokens): tokens is string[] => tokens !== null);

        // Train a Naive Bayes classifier
        const classifier = new natural.BayesClassifier();
        for (let i = 0; i < tokenizerSamples.length; i++) {
            classifier.addDocument([...tokenizerSamples[i]], labels[i].toString()); // copy ensures immutability
        }
        classifier.train();
        const prediction: string = classifier.classify([...tokenizedSnippet]);
        const result: number = parseInt(prediction, 10);
        if (result === 1) {
            isDetected = true;
            logger.report("==> Code vulnerable to " + kindStr + " in this file!!! ");
        }
        if (IS_DEBUG) {
            debugDetectionStats(kind, tokenizedSnippet, labels, prediction);
        }
    }
    if (!isDetected) {
        logger.report("==> Code NOT vulnerable to " + kindStr);
    }

    return !isDetected;
}

function main(): number {
    const vulnerabilities: readonly Vulnerability[] = Object.values(
        Vulnerability,
    ).filter((value): value is Vulnerability => typeof value === "number");

    const codeSnippetSample =
        "const loginLimiter = rateLimit({\n" +
        "            store: new MongoStore({\n" +
        "              uri: 'mongodb://localhost:27017/ratelimits',\n" +
        "              expireTimeMs: 60 * 1000, // 1 minute\n" +
        "            }),\n" +
        "            max: 5,\n" +
        "            message: 'Too many login attempts from this IP, please try again later.'\n" +
        "          });";

    let exitStatus: number = 0;

    for (const vulnerability of vulnerabilities) {
        const logger = new ThreadSafeLogger();
        if (!detect(vulnerability, codeSnippetSample, logger)) {
            exitStatus = 1;
        }

        logger.flush();
    }

    return exitStatus;
}

main();

Thanks for the proposed changes, would review it shortly.

prayas7102 commented 1 month ago

@lloydlobo I've reviewed your changes, i've few suggestions:

Remove code related to Thread buffer, main function, environment mode for the time being.
Remove this comment, if possible : // prettier-ignore

I've assigned this issue to you. Please proceed with your changes and ensure the previous and current terminal outputs are in sync. I'll handle the NPM packaging, don't worry about that.

lloydlobo commented 1 month ago

@lloydlobo I've reviewed your changes, i've few suggestions:
1. Remove code related to Thread buffer, main function, environment mode for the time being.

2. Remove this comment, if possible : `// prettier-ignore`
I've assigned this issue to you. Please proceed with your changes and ensure the previous and current terminal outputs are in sync. I'll handle the NPM packaging, don't worry about that.

Hey @prayas7102 I will look into this probably in 24 hours from now. Bit busy with a prior obligation,

Sure thing, also a relief to know that you can handle the NPM packaging.

lloydlobo commented 1 month ago

Hey, there is a draft PR #16 in the works.

I encountered a potential issue mentioned in the PR regarding multiple logs:

When Log.detectIfVulnerability function is declared in a scope outside of class Log, the logger logs twice.
- Need to look into how the overrides on console.log and console.error influence the above.
- For the time being, the detectIfVulnerability function is declared as a static method of class Log.

Besides, minor formatting tweaks, the PR seems ready.

lloydlobo commented 1 month ago

Hey there @prayas7102, PR #16 is ready for review :)

prayas7102 / NodejsSecurify

Unifying Vulnerability Detection with a Single Automated Script for Multiple Security Checks #7

Key steps: