iksnae / actual-intelligence

A practical, non-technical guide to using AI tools like ChatGPT in everyday life
MIT License
0 stars 0 forks source link

Build Process Issue: Fix to Restore English Book Generation #45

Closed khaos-codi closed 3 months ago

khaos-codi commented 3 months ago

Issue Description

After refactoring the build scripts into modular components, the book build is succeeding but not producing any output files. At commit f5b41647482d6446ca40a9db9d7c71bb423c2b02, the English version of the book was building correctly, but the Spanish version wasn't.

Root Cause

The recently refactored build system (which moved from a monolithic build.sh to modular scripts in tools/scripts/) appears to have issues with the handling of multiple languages. When trying to build all languages with the --all-languages flag, no output files are being produced.

Fix Implemented

PR #45 implements a temporary fix to restore the English build functionality:

  1. Modified the GitHub Actions workflow (.github/workflows/build-book.yml) to focus only on the English version
  2. Changed the build command from ./build.sh --all-languages to ./build.sh --lang=en
  3. Removed Spanish-related directories creation and artifact uploading
  4. Updated release notes to only reference the English version

Future Work

Once the English build is confirmed working, we should:

  1. Investigate why the Spanish build is failing
  2. Check the combine-markdown.sh script to ensure proper handling of language-specific paths
  3. Verify the directory structure in book/es/ matches what scripts expect
  4. Test the build with --lang=es flag specifically to isolate Spanish build issues
  5. Once fixed, restore multi-language support in the workflow

Related Commits

khaos-codi commented 3 months ago

Update on Build Problem

After examining the build logs, we identified the root cause of the build failure. The issue was a permission problem with the script files in the Docker container environment:

./build.sh: line 7: /__w/actual-intelligence/actual-intelligence/tools/scripts/build.sh: Permission denied
./build.sh: line 7: exec: /__w/actual-intelligence/actual-intelligence/tools/scripts/build.sh: cannot execute: Permission denied

Implemented Fix

PR #46 addresses this by adding script permissions to all .sh files before running the build:

# Fix permissions on script files
chmod +x build.sh
find tools/scripts -name "*.sh" -exec chmod +x {} \;

Technical Explanation

When the repository is checked out in the GitHub Actions Docker container, the script files lose their executable permissions. This is a common issue when working with Docker containers and shell scripts.

The fix ensures that all shell scripts in the project (especially in the modular tools/scripts/ directory) have the proper executable permissions (+x) before they're run.

The workflow will now:

  1. Make all shell scripts executable
  2. Build the English version of the book
  3. Upload generated files as artifacts
  4. Create a release if run on the main branch

We're waiting for the workflow to complete to confirm this fix resolves the issue.

khaos-codi commented 3 months ago

Update: Re-enabled All Languages

Great news! We've successfully restored the English book build and now we're re-enabling support for all languages.

Changes Made:

  1. Fixed the script permission issue that was preventing the build from running
  2. Confirmed English book generation is working correctly
  3. Updated the workflow to build all languages again with the following changes:
    • Restored the Spanish directories (build/es, build/es/images) creation
    • Changed the build command from ./build.sh --lang=en to ./build.sh --all-languages
    • Re-added Spanish book files to the release artifacts and release notes

The build should now produce both English and Spanish versions of the book. We'll keep monitoring to ensure both languages build successfully.

Next Steps:

  1. Confirm Spanish book generation works correctly
  2. Check that all book formats (PDF, EPUB, MOBI, HTML) are generated for all languages
  3. Verify the GitHub Pages deployment includes both language versions

We'll close this issue once we confirm all languages are building properly.

khaos-codi commented 3 months ago

Additional Fix: EPUB Image Inclusion

We've identified and fixed another issue related to images not appearing in the EPUB files. The problem was in the generate-epub.sh script.

What Was Fixed:

  1. The script was using eval with the cover image option, which was causing issues with proper command execution
  2. The image path handling in the current modular version wasn't matching what worked in the monolithic version at commit 6c0dad49acff74ee49d5904c9e6a05eadf6b75c1
  3. The fallback strategy for image inclusion needed enhancement

Implemented Solution:

I've updated the generate-epub.sh script with the following improvements:

  1. Removed the eval usage and split the commands into two clear conditional blocks
  2. Added a more robust fallback approach for image paths, including a step to make image references more resilient
  3. Improved the extract media directory handling for different languages
  4. Enhanced error messages and flow control

This fix should restore image inclusion in the EPUB files while maintaining the modular build system. The changes match the approach that was working in the original monolithic script.

The next build run should produce EPUB files with all images properly included.

khaos-codi commented 3 months ago

EPUB Image Inclusion: Comprehensive Fix

After closer examination of the working build (commit 6c0dad49), I found several critical differences in how images are handled. I've implemented a comprehensive fix for EPUB image inclusion:

Key Issues Identified

  1. The set -e exit-on-error flag in the EPUB generation script was stopping further processing when one command failed
  2. The image path handling wasn't consistent across the modular scripts
  3. The EPUB generation script wasn't properly handling the extract-media option
  4. The cover image setup and propagation had issues
  5. Missing debug information made it hard to troubleshoot problems

Comprehensive Solution

I've implemented several changes to ensure images are correctly included:

  1. Updated generate-epub.sh script:

    • Changed to set +e to handle errors manually rather than exiting
    • Improved resource path handling for different languages
    • Added better fallback strategies for image inclusion
    • Added file size verification to detect missing images
    • Added comprehensive debug information
  2. Improved setup.sh script:

    • Enhanced cover image handling
    • Added explicit copying of images to build directories
    • Improved directory validation and creation
    • Added debug summary of the environment

These changes should restore the EPUB file size to the expected ~12MB range that was seen with commit 6c0dad49, indicating proper image inclusion.

Technical Details

The most critical issue appears to be in how pandoc handles image paths. The previous implementation was using eval with string concatenation for options, which was problematic. I've replaced this with clean conditional blocks and explicit path handling.

I've also added image path fallback strategies to handle different image reference formats in the markdown files.

Verification

The updated script now checks the file size of the generated EPUB to verify that images are likely included. We'll confirm the fix when the workflow runs.

khaos-codi commented 3 months ago

New Approach: Complete Rewrite of EPUB Generation

After extensive research on how Pandoc handles image paths in EPUB files, I've completely rewritten the EPUB generation script with a fundamentally different approach:

Key Insights from Research

  1. Working Directory Matters: Pandoc resolves image paths relative to the current working directory, not necessarily relative to the Markdown file being processed
  2. Self-Contained Flag: The --self-contained option is critical for ensuring images are bundled properly
  3. Path Resolution: Pandoc's path resolution is tricky, especially with nested directories

New Approach: Isolated Working Directory

My new solution creates a self-contained working environment for EPUB generation:

  1. Creates a unique working directory for each EPUB build
  2. Copies the input Markdown file to this directory
  3. Creates an images subdirectory and copies all images from every possible source location
  4. Modifies image paths in the Markdown to use the local images/ directory
  5. Changes to the working directory before running Pandoc
  6. Uses the --self-contained flag to ensure all assets are bundled
  7. Adds a fallback method using --extract-media if the first attempt fails
  8. Implements thorough debugging to show exactly what files are available

This approach eliminates path resolution issues by ensuring Pandoc runs with all resources directly accessible from its working directory. The script also includes detailed debugging output showing all files available, which should make it easier to diagnose any issues.

We'll know this approach works when we see EPUB files with sizes around 12MB, indicating successful image inclusion.

khaos-codi commented 3 months ago

Docker-Compatible Approach for EPUB Image Inclusion

After analyzing the build logs, I've identified that the workflow runs in a Docker container (iksnae/book-builder). This is critical information that affects how our solution needs to be implemented.

I've created another revision of the EPUB generation script with a Docker-compatible approach:

Key Changes for Docker Environment:

  1. Path Simplification: Avoided complex directory hierarchy manipulations that might not work well in Docker containers

  2. Fixed Image Paths: Modified the Markdown file to explicitly reference images from a predictable location (build/images/)

  3. Multi-Attempt Strategy: Implemented a three-step fallback process:

    • First attempt: Uses --self-contained flag with verbose output
    • Second attempt: Uses --embed-resources flag if the first fails
    • Final attempt: Creates a temporary directory with local references and uses a subshell to change directory
  4. Enhanced Logging: Added detailed logging to help debug any issues, capturing pandoc output to build/pandoc-output.log

  5. Size Verification: Added file size checks after each attempt to verify image inclusion

The script is now much more robust for running in a containerized environment and provides multiple fallback mechanisms. This approach follows best practices for dealing with path resolution issues in Docker containers.

We'll know if this is successful when the EPUB file sizes are significantly larger (~12MB), indicating that images are properly included.

khaos-codi commented 3 months ago

Final Refinement: Absolute Path Support for Docker

I've made one final refinement to the EPUB generation script to ensure it works in the Docker environment:

New Features Added:

  1. Absolute Path Handling: The script now tries using absolute paths for image references, which is often necessary in Docker containers

  2. Individual File Copy: Using find with exec to ensure all image files are properly copied to the build directory

  3. Full Working Directory Capture: Gets and uses the absolute path to the working directory

  4. Triple-Fallback Strategy: Now tries three different approaches in sequence:

    • First: Absolute paths with --self-contained flag
    • Second: Relative paths with --embed-resources flag
    • Third: Directory-changing approach with local paths
  5. Diagnostic Archive: Saves a timestamped copy of any successfully generated EPUB for debugging purposes

The script now implements a comprehensive "try everything" approach that should work regardless of the specific Docker container configuration. One of these methods should successfully include images in the EPUB files.

This is the most robust solution for ensuring image inclusion in a containerized environment where the exact path resolution rules might vary.

khaos-codi commented 3 months ago

Simplified Approach Based on Docker Container Analysis

After examining the Docker container configuration in the iksnae/book-builder repository, I've implemented a much simpler approach that follows the conventions used in your container's existing build scripts.

Key Findings from Container Analysis:

  1. The Docker image is based on Ubuntu 22.04 with a standard Pandoc installation
  2. The Python build script in the container uses a straightforward approach for EPUB generation
  3. The container doesn't use any special flags or complex processing for EPUB generation

Simplified Solution:

I've rewritten the EPUB generation script to be simpler and more aligned with your container's conventions:

  1. Direct Approach: Uses a simpler Pandoc command similar to what's used in the container's Python script
  2. Metadata Handling: Uses the --metadata-file option to properly set metadata in a YAML format
  3. Standardized Resource Handling: Centralizes all images in the build/images directory
  4. Enhanced Logging: Saves detailed logs for debugging in build/logs directory
  5. Fallback with Self-Contained: Only tries one fallback approach using the --self-contained flag if needed

This simplified approach is more likely to work in your Docker environment because it follows the same patterns that are already being used in the container's existing scripts.

The script still implements file size checks to verify image inclusion, and includes appropriate debugging information.

khaos-codi commented 3 months ago

Diagnostic Toolkit for Docker Environment

Since we're still facing issues with image inclusion in the Docker environment, I've created a comprehensive diagnostic toolkit to help isolate and resolve the problem:

1. Diagnostic Test Script

I've added tools/debug/test-epub-images.sh - a standalone script that tests 9 different Pandoc approaches for EPUB image inclusion using a simple test document. This script will help identify exactly which combination of options works in your specific Docker container.

2. Original Working Approach

I've recreated the approach from the working commit (f5b41647) in tools/debug/old-working-approach.sh to allow direct comparison with the current environment.

3. Detailed Troubleshooting Guide

I've created tools/debug/TROUBLESHOOTING.md with comprehensive steps to diagnose and fix the issue, including:

How to Use This Toolkit

When you run your next workflow, you can add these steps to help diagnose the issue:

- name: Run EPUB image diagnostic tests
  run: |
    chmod +x tools/debug/test-epub-images.sh
    ./tools/debug/test-epub-images.sh

    echo "Test results:"
    du -k test-epub/*.epub

Alternatively, you can run these scripts directly in your Docker container:

docker run -v $(pwd):/workspace iksnae/book-builder /bin/bash -c "cd /workspace && chmod +x tools/debug/test-epub-images.sh && ./tools/debug/test-epub-images.sh"

These diagnostic tools will help identify the specific Pandoc command that works in your Docker environment, which we can then incorporate into the main build process.

khaos-codi commented 3 months ago

Docker-Centric Solution: Encapsulate Image Inclusion Complexity in the Container

I completely agree that Docker should be encapsulating this complexity rather than requiring constant script adjustments. To that end, I've implemented a more sustainable solution by updating the Docker container itself:

Improvements to the iksnae/book-builder Container:

  1. Added dedicated generate-epub utility - A robust script that consistently handles image inclusion in EPUBs
  2. Included ImageMagick - Additional image processing capabilities
  3. Updated documentation - Detailed README with usage examples

The generate-epub utility specifically addresses the image inclusion issues by:

  1. Creating isolated working directories for each build
  2. Collecting all images from resource paths automatically
  3. Setting proper metadata with consistent formatting
  4. Using a combination of flags known to work reliably
  5. Providing detailed verbose output for troubleshooting

How to Use It:

Once the updated container is built and pushed, you can update your build-language.sh script to use this utility:

# Instead of direct pandoc call
generate-epub \
  --title "$BOOK_TITLE" \
  --author "Open Source Community" \
  --publisher "Khaos Studios" \
  --language "$LANGUAGE" \
  --cover "$COVER_IMAGE" \
  --resource-path "$RESOURCE_PATHS" \
  --verbose \
  "$MARKDOWN_PATH" "$EPUB_PATH"

Benefits of This Approach:

  1. True Encapsulation - The complexity is handled inside the container
  2. Consistent Results - Works the same way across all book projects
  3. Self-Documenting - Clear interface with sensible defaults
  4. Easier Maintenance - Updates to the utility benefit all projects

This approach better fulfills Docker's promise of encapsulating complexity while maintaining consistent functionality across environments.