mechatroner / RBQL

🦜RBQL - Rainbow Query Language: SQL-like query engine for (not only) CSV file processing. Supports SQL queries with Python and JavaScript expressions.
https://rbql.org
MIT License
276 stars 13 forks source link

Add a rbql.d.ts file for TypeScript #46

Open agladysh opened 12 months ago

agladysh commented 12 months ago

It would be nice to have a type declaration for TypeScript.

Here is a very rough draft I quickly authored with the help of the Claude 2 AI. It is most certainly subtly wrong — I forgot to drop the AI temperature 🤦 . Still, I used it to write actual code today, so it works at least for my very limited use case.

Sorry for not sending a PR, but it is way too raw for my taste. If you're interested, I can improve it and submit sometime in the future.

// Based on code, generated by Claude 2 for tag v0.20.0. Likely wrong.
// DO NOT USE in the client code, unless you know what are you doing.

declare module 'rbql' {
  /**
   * Item type for a record. Can be string, number, null or any other type.
   */
  type RecordItem = any;

  /**
   * Interface for iterating over input records.
   */
  export interface RBQLInputIterator {

    /**
     * Stop iterating over records.
     */
    stop(): void;

    /**
     * Get a map of variables referenced in the query text.
     * Variables will have initialize property which indicates whether variable needs to be initialized.
     */
    getVariablesMap(query: string): Promise<{[key: string]: {initialize: boolean, index: number}}>;

    /**
     * Get next record from the input. Returns null if no more records.
     */
    getRecord(): Promise<RecordItem[] | null>;

    /**
     * Handle modifier in query like WITH (headers).
     * Can be optionally implemented if query syntax supports modifiers.
     */
    handleQueryModifier?(modifier: string): void;

    /**
     * Get any warnings produced while iterating over records.
     */
    getWarnings(): string[];

    /**
     * Get header row for the input table if available.
     */
    getHeader?(): Promise<string[] | null>;
  }

  /**
   * Interface for writing output records.
   * Classes implementing this interface can write records to different destinations like arrays, files etc.
   */
  export interface RBQLOutputWriter {
    /**
     * Write a record (array of items) to output.
     */
    write(record: RecordItem[]): Promise<boolean>;

    /**
     * Finish writing output - can be used for resource cleanup.
     */
    finish(): Promise<void>;

    /**
     * Get any warnings produced while writing output records.
     */
    getWarnings(): string[];

    /**
     * Set header for output. Can be optionally implemented if output format supports header.
     */
    setHeader?(header: string[]): void;
  }

  /**
   * Interface for getting input iterators for join tables.
   * Classes implementing this can provide join table data from various sources.
   */
  export interface RBQLJoinTableRegistry {

    /**
     * Get RBQLInputIterator for provided table ID.
     * Throws error if table not found.
     */
    get_iterator_by_table_id(tableId: string): RBQLInputIterator;

    /**
     * Get any warnings produced while handling join tables.
     */
    get_warnings(warnings: string[]);

  }

  /**
   * Run query against input/output classes and registries.
   */
  export declare function query(
    query: string,
    inputIterator: RBQLInputIterator,
    outputWriter: RBQLOutputWriter,
    outputWarnings: string[],
    joinTablesRegistry?: RBQLJoinTableRegistry,
    userInitCode?: string
  ): Promise<void>;

  /**
   * Run query against input/output tables.
   */
  export declare function query_table(
    query: string,
    inputTable: RecordItem[][],
    outputTable: RecordItem[][],
    outputWarnings: string[],
    joinTable?: RecordItem[][],
    inputColumnNames?: string[],
    joinColumnNames?: string[],
    outputColumnNames?: string[],
    normalizeColumnNames?: boolean,
    userInitCode?: string
  ): Promise<void>;

  /**
   * Run query against CSV files.
   */
  export declare function query_csv(
    query: string,
    inputPath: string,
    inputDelim: string,
    inputPolicy: 'simple' | 'quoted' | 'quoted_rfc',
    outputPath: string,
    outputDelim: string,
    outputPolicy: 'simple' | 'quoted' | 'quoted_rfc',
    csvEncoding: 'binary' | 'utf-8',
    outputWarnings: string[],
    withHeaders?: boolean,
    commentPrefix?: string
  ): Promise<void>;

  /**
   * Convert exception to error info array.
   */
  export declare function exceptionToErrorInfo(
    exception: any
  ): [string, string];

  /**
   * Implements RBQLInputIterator to iterate over a table of records.
   */
  export declare class TableIterator implements RBQLInputIterator {

    /**
     * Constructor
     *
     * @param table - Input table of records
     * @param columnNames - Optional list of column names
     * @param normalizeColumnNames - Whether to normalize column names
     * @param variablePrefix - Prefix for variables e.g. 'a'
     */
    constructor(
      table: RecordItem[][],
      columnNames?: string[],
      normalizeColumnNames?: boolean,
      variablePrefix?: string
    )
  }

  /**
   * Implements RBQLOutputWriter to write records to a table.
   */
  export declare class TableWriter implements RBQLOutputWriter {

    /**
     * Constructor
     *
     * @param table - External table to write output records to
     */
    constructor(table: RecordItem[][])
  }

  /**
   * Implements RBQLJoinTableRegistry to provide join table iterator.
   */
  export declare class SingleTableRegistry implements RBQLJoinTableRegistry {

    /**
     * Constructor
     *
     * @param table - Join table
     * @param columnNames - Optional list of join table column names
     * @param normalizeColumnNames - Whether to normalize column names
     * @param tableId - Id/Name for the join table
     */
    constructor(
      table: RecordItem[][],
      columnNames?: string[],
      normalizeColumnNames?: boolean,
      tableId?: string
    )
  }

  /**
   * RBQL library version.
   */
  export declare const version: string;

  /**
   * Implements RBQLInputIterator to iterate over CSV records.
   */
  export declare class CSVRecordIterator implements RBQLInputIterator {

    /**
     * Constructor
     *
     * @param stream - Input stream
     * @param csvPath - Path to CSV file
     * @param encoding - Encoding of the CSV file
     * @param delim - Field delimiter character
     * @param policy - CSV dialect policy
     * @param hasHeader - Whether CSV has header row
     * @param commentPrefix - Prefix for comment lines to skip
     * @param tableName - Name of the input table
     * @param variablePrefix - Prefix for variables e.g. 'a'
     */
    constructor(
      stream: NodeJS.ReadableStream,
      csvPath: string | null,
      encoding: string,
      delim: string,
      policy: 'simple' | 'quoted' | 'quoted_rfc',
      hasHeader?: boolean,
      commentPrefix?: string | null,
      tableName?: string,
      variablePrefix?: string
    )

    /**
     * Stop iterating over records.
     */
    stop(): void;

    /**
     * Get a map of variables referenced in the query text.
     * Variables will have initialize property which indicates whether variable needs to be initialized.
     */
    getVariablesMap(query: string): Promise<{[key: string]: {initialize: boolean, index: number}}>;

    /**
     * Get next record from the input. Returns null if no more records.
     */
    getRecord(): Promise<RecordItem[] | null>;

    /**
     * Handle modifier in query like WITH (headers).
     * Can be optionally implemented if query syntax supports modifiers.
     */
    handleQueryModifier?(modifier: string): void;

    /**
     * Get any warnings produced while iterating over records.
     */
    getWarnings(): string[];

    /**
     * Get header row for the input table if available.
     */
    getHeader?(): Promise<string[] | null>;
  }

  /**
   * Implements RBQLOutputWriter to write records to a CSV.
   */
  export declare class CSVWriter implements RBQLOutputWriter {

    /**
     * Constructor
     *
     * @param stream - Output stream
     * @param closeStreamOnFinish - Whether to close stream on finish
     * @param encoding - Encoding for the output CSV
     * @param delim - Field delimiter character
     * @param policy - CSV dialect policy
     */
    constructor(
      stream: NodeJS.WritableStream,
      closeStreamOnFinish: boolean,
      encoding?: string,
      delim?: string,
      policy?: 'simple' | 'quoted' | 'quoted_rfc'
    )

    /**
     * Write a record (array of items) to output.
     */
    write(record: RecordItem[]): Promise<boolean>;

    /**
     * Finish writing output - can be used for resource cleanup.
     */
    finish(): Promise<void>;

    /**
     * Get any warnings produced while writing output records.
     */
    getWarnings(): string[];

    /**
     * Set header for output. Can be optionally implemented if output format supports header.
     */
    setHeader?(header: string[]): void;
  }

  interface FileSystemCSVOptions {
    /**
     * Whether to read the entire input CSV file into memory for faster querying.
     * Default is false.
     */
    bulkRead?: boolean;

    /**
     * Maximum number of records to load into memory if bulkRead is true.
     */
    bulkReadRecordLimit?: number;

    /**
     * Whether the input CSV has a header row. Default is false.
     */
    hasHeader?: boolean;

    /**
     * Comment prefix to skip lines. Lines starting with this will be ignored.
     */
    commentPrefix?: string;

    /**
     * Encoding of the input CSV file. Default is 'utf-8'.
     */
    encoding?: 'utf-8' | 'latin-1';

    /**
     * Field delimiter for input CSV. Default is ','.
     */
    delimiter?: string;

    /**
     * Dialect policy for parsing input CSV.
     * Default is 'quoted'.
     */
    policy?: 'simple' | 'quoted' | 'quoted_rfc';

  }

  /**
   * Implements RBQLJoinTableRegistry to provide CSV join table iterators.
   */
  export declare class FileSystemCSVRegistry implements RBQLJoinTableRegistry {

    /**
     * Constructor
     *
     * @param inputFileDir - Base directory for input CSV files
     * @param delim - Field delimiter character
     * @param policy - CSV dialect policy
     * @param encoding - Encoding of the CSV files
     * @param hasHeader - Whether CSVs have header row
     * @param commentPrefix - Prefix for comment lines to skip
     * @param options - Additional options
     */
    constructor(
      inputFileDir: string,
      delim: string,
      policy: 'simple' | 'quoted' | 'quoted_rfc',
      encoding: string,
      hasHeader?: boolean,
      commentPrefix?: string | null,
      options?: FileSystemCSVOptions
    )
  }

  export declare class RbqlIOHandlingError extends Error { }

  export declare class AssertionError extends Error { }
}

declare module 'rbql/rbql_csv' {
  export declare function is_ascii(str: string): boolean
}
mechatroner commented 12 months ago

Thank you! I agree that this would be really nice to have a Typescript interface available. I don't have a lot of experience with TS, so I would need to consider how this might affect the overall structure of the project, perhaps we would need a new rbql-ts folder with typescript code? Also, this would need some docs and at least a minimal set of unit tests probably, so this could end up being a big undertaking. If you have time and willingness to do this I would really appreciate your help!

agladysh commented 12 months ago

@mechatroner I believe no separate implementation would be needed, just the type declarations file, similar to one I provided above. See https://github.com/DefinitelyTyped/DefinitelyTyped for the process with all bells and whistles.

That is, I'm already using the rbql-js as an npm module from my TypeScript code. The proper declarations file would make it much more convenient.