Implement Subdirectory Output File Organization & Source-specific Names

tribixbite commented 3 months ago

For doing a lot of extraction I added some custom naming for output files, with token count so you can use the files later without having to remember how big they are.... gpt-aided summary:

Subdirectory Output Organization: Modified the script to create output files (_full_output.txt, _min_output.txt, and _processed_urls.txt) within a dynamically named subdirectory under output/, based on the input source name. This change aids in maintaining a cleaner working directory and better organizes outputs, especially when processing multiple sources.
Dynamic Filename Convention: Updated the filename convention to {base_name}_{token_count}_{type}.txt for both uncompressed (full) and compressed (min) output files, where {type} reflects the file's content. This update makes it easier to identify files by their source and content status, providing quick insights into the token count directly from the filename.
README Updates: Revised the README file to reflect these changes, ensuring users are fully informed about the tool's functionality, usage, and the new file naming and organization scheme. The documentation now includes updated instructions and clarifies the output file structure, enhancing the tool's accessibility to new users.

Reason for Changes:

Improved File Management: By organizing output files into subdirectories, users can more easily manage their workspace, especially when dealing with multiple data sources. This organization prevents clutter and makes it straightforward to locate and distinguish outputs from different inputs.
Enhanced File Naming: Incorporating the token count and content type into filenames provides immediate context about each file's contents and processing state without needing to open the file. This naming convention is particularly useful for users working with large datasets and needing quick file identification.

jimmc414 commented 3 months ago

Thank you for contributing!

jimmc414 commented 3 months ago

I'm afraid I will need to roll back this merged pull request. It no longer copies the output to the clipboard and it adds additional prompts for the user when I would prefer it to be only one URL or path passed with the program intuitively handling it properly

tribixbite commented 3 months ago

I'll make a new one that keeps your preferences by default and lets the user set config flags to turn on/off features. I opened this before making the third commit

jimmc414 / 1filellm

Implement Subdirectory Output File Organization & Source-specific Names #9

Reason for Changes: