jupyterlab / jupyter-ai

A generative AI extension for JupyterLab
https://jupyter-ai.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
3.21k stars 325 forks source link

Issue with /generate Command Producing Invalid Filenames #990

Open Tina-Sterite opened 1 month ago

Tina-Sterite commented 1 month ago

Description

I've been using the /generate command to create Jupyter notebooks from text prompts, but it seems to be generating filenames that contain colons (:). This is causing issues, especially on Windows systems where colons are not allowed in filenames. Could you please provide a fix or a workaround to prevent colons from being included in the generated filenames?

Reproduce

This issue is random.

There are 2 examples in this screenshot.. here only a portion of the file name (probably before the colon) is displayed. image

Here are some examples of prompts and file names that are rendered: prompt: /generate how to format markdown cells in jupyter notebooks result: I have created your notebook and saved it to the location C:\Users\steri\Desktop\GitHub\JupyterLabAI\Mastering Markdown in Jupyter Notebooks: A Comprehensive Guide.ipynb

prompt: /generate formatting markdown cells in jupyter notebooks result: I have created your notebook and saved it to the location C:\Users\steri\Desktop\GitHub\JupyterLabAI\Markdown Formatting in Jupyter Notebooks: A Comprehensive Guide.ipynb

prompt: /generate formatting markdown cells in jupyter notebooks. name this notebook markdown.ipynb result: SUCCESS! Third time was a charm! I have created your notebook and saved it to the location C:\Users\steri\Desktop\GitHub\JupyterLabAI\Mastering Markdown in Jupyter Notebooks.ipynb

prompt: /generate using Langchain Extraction via the Langchain REST API with with detailed code examples and detailed explanations result: I have created your notebook and saved it to the location C:\Users\steri\Desktop\GitHub\JupyterLabAI\Langchain Text Extraction: API Integration & Advanced Techniques.ipynb

prompt: /generate Langchain REST API Langchain Extraction result: SUCCESS! I have created your notebook and saved it to the location C:\Users\steri\Desktop\GitHub\JupyterLabAI\Extracting Data from REST APIs with Langchain in Python.ipynb

I'll paste the details from the server below..

Expected behavior

I'm expecting files to be generated without a colon in the name. Every jupyter notebook file generated without a colon in the name is usable.

Context

Troubleshoot Output
Paste the output from running `jupyter troubleshoot` from the command line here.
You may want to sanitize the paths in the output.
Command Line Output
Prompt after formatting:
Create a markdown summary for a Jupyter notebook with the following content. The summary should consist of a single paragraph.
Content:
{'description': 'A Jupyter notebook focused on generating text using Langchain Extraction via the Langchain REST API with detailed code examples and explanations.', 'sections': [{'title': 'Setting Up the Environment', 'content': 'In this section, we will install the necessary libraries and set up the environment for using Langchain Extraction via the Langchain REST API.\n\n```python\n# Install necessary libraries\n!pip install requests langchain\n\n# Import libraries\nimport requests\nfrom langchain import LangchainClient\n```'}, {'title': 'Introduction to Langchain Extraction', 'content': "This section provides a brief introduction to Langchain and its text extraction capabilities. We will also cover the basics of interacting with the Langchain REST API.\n\n```python\n# Initialize the Langchain client\nclient = LangchainClient(api_key='your_api_key')\n\n# Basic API endpoint information\napi_endpoint = 'https://api.langchain.com/extract'\n```"}, {'title': 'Authenticating with Langchain REST API', 'content': "Learn how to authenticate with the Langchain REST API using your API key. This is a crucial step before making any API calls.\n\n```python\n# Set up headers for authentication\nheaders = {\n    'Authorization': f'Bearer {client.api_key}',\n    'Content-Type': 'application/json'\n}\n\n# Verify authentication\nresponse = requests.get(api_endpoint, headers=headers)\nif response.status_code == 200:\n    print('Authentication successful')\nelse:\n    print('Authentication failed')\n```"}, {'title': 'Extracting Text from Documents', 'content': "In this section, we will provide a detailed example of extracting text from a document using the Langchain REST API. We will cover the necessary API call and how to handle the response.\n\n```python\n# Define the document URL or path\ndocument_url = 'https://example.com/sample.pdf'\n\n# API request payload\npayload = {\n    'url': document_url,\n    'extract_options': {\n        'format': 'text'\n    }\n}\n\n# Make the API request\nresponse = requests.post(api_endpoint, headers=headers, json=payload)\n\n# Handle the response\nif response.status_code == 200:\n    extracted_text = response.json().get('text', '')\n    print('Extracted Text:', extracted_text)\nelse:\n    print('Failed to extract text')\n```"}, {'title': 'Handling API Errors', 'content': "Learn how to gracefully handle errors when interacting with the Langchain REST API. This section includes examples of common error scenarios and how to manage them in your code.\n\n```python\n# Example function to handle API errors\ndef handle_api_errors(response):\n    if response.status_code == 400:\n        print('Bad Request: ', response.json().get('message', 'Unknown error'))\n    elif response.status_code == 401:\n        print('Unauthorized: Invalid API key')\n    elif response.status_code == 403:\n        print('Forbidden: Access denied')\n    elif response.status_code == 404:\n        print('Not Found: Invalid endpoint or resource')\n    elif response.status_code == 500:\n        print('Server Error: Try again later')\n    else:\n        print('Unexpected Error: ', response.status_code)\n\n# Example usage\nresponse = requests.post(api_endpoint, headers=headers, json=payload)\nhandle_api_errors(response)\n```"}, {'title': 'Advanced Extraction Techniques', 'content': "Explore advanced text extraction techniques available through the Langchain REST API. This includes extracting structured data and working with different document formats.\n\n```python\n# Advanced extraction options\nextract_options = {\n    'format': 'structured',\n    'elements': ['tables', 'figures', 'references']\n}\n\n# Update payload with advanced options\npayload['extract_options'] = extract_options\n\n# Make the advanced extraction request\nresponse = requests.post(api_endpoint, headers=headers, json=payload)\n\n# Handle the response\nif response.status_code == 200:\n    extracted_data = response.json()\n    print('Extracted Structured Data:', extracted_data)\nelse:\n    handle_api_errors(response)\n```"}, {'title': 'Integrating with Other Tools', 'content': "Demonstrate how to integrate the extracted text and data with other tools or workflows. This section includes examples of exporting the data to different formats and using it in further processing.\n\n```python\n# Example: Exporting extracted text to a file\nwith open('extracted_text.txt', 'w') as file:\n    file.write(extracted_text)\n\n# Example: Using extracted data in further processing\nimport pandas as pd\n\n# Convert structured data to DataFrame\nif 'tables' in extracted_data:\n    tables = extracted_data['tables']\n    for table in tables:\n        df = pd.DataFrame(table['data'], columns=table['headers'])\n        print(df)\n```"}], 'prompt': '/generate using Langchain Extraction via the Langchain REST API with with detailed code examples and detailed explanations'}

Prompt after formatting:
You are an AI that writes code for a single section of a Jupyter notebook.
Overall topic of the notebook: A Jupyter notebook focused on generating text using Langchain Extraction via the Langchain REST API with detailed code examples and explanations.
Title of the notebook section: Integrating with Other Tools
Description of the notebok section: Demonstrate how to integrate the extracted text and data with other tools or workflows. This section includes examples of exporting the data to different formats and using it in further processing.

```python
# Example: Exporting extracted text to a file
with open('extracted_text.txt', 'w') as file:
    file.write(extracted_text)

# Example: Using extracted data in further processing
import pandas as pd

# Convert structured data to DataFrame
if 'tables' in extracted_data:
    tables = extracted_data['tables']
    for table in tables:
        df = pd.DataFrame(table['data'], columns=table['headers'])
        print(df)
```
Given this information, write all the code for this section and this section only. Your output should be valid code with inline comments.

Prompt after formatting:
You are an AI that writes code for a single section of a Jupyter notebook.
Overall topic of the notebook: A Jupyter notebook focused on generating text using Langchain Extraction via the Langchain REST API with detailed code examples and explanations.
Title of the notebook section: Authenticating with Langchain REST API
Description of the notebok section: Learn how to authenticate with the Langchain REST API using your API key. This is a crucial step before making any API calls.

```python
# Set up headers for authentication
headers = {
    'Authorization': f'Bearer {client.api_key}',
    'Content-Type': 'application/json'
}

# Verify authentication
response = requests.get(api_endpoint, headers=headers)
if response.status_code == 200:
    print('Authentication successful')
else:
    print('Authentication failed')
```
Given this information, write all the code for this section and this section only. Your output should be valid code with inline comments.

Prompt after formatting:
You are an AI that writes code for a single section of a Jupyter notebook.
Overall topic of the notebook: A Jupyter notebook focused on generating text using Langchain Extraction via the Langchain REST API with detailed code examples and explanations.
Title of the notebook section: Handling API Errors
Description of the notebok section: Learn how to gracefully handle errors when interacting with the Langchain REST API. This section includes examples of common error scenarios and how to manage them in your code.

```python
# Example function to handle API errors
def handle_api_errors(response):
    if response.status_code == 400:
        print('Bad Request: ', response.json().get('message', 'Unknown error'))
    elif response.status_code == 401:
        print('Unauthorized: Invalid API key')
    elif response.status_code == 403:
        print('Forbidden: Access denied')
    elif response.status_code == 404:
        print('Not Found: Invalid endpoint or resource')
    elif response.status_code == 500:
        print('Server Error: Try again later')
    else:
        print('Unexpected Error: ', response.status_code)

# Example usage
response = requests.post(api_endpoint, headers=headers, json=payload)
handle_api_errors(response)
```
Given this information, write all the code for this section and this section only. Your output should be valid code with inline comments.

Prompt after formatting:
You are an AI that writes code for a single section of a Jupyter notebook.
Overall topic of the notebook: A Jupyter notebook focused on generating text using Langchain Extraction via the Langchain REST API with detailed code examples and explanations.
Title of the notebook section: Extracting Text from Documents
Description of the notebok section: In this section, we will provide a detailed example of extracting text from a document using the Langchain REST API. We will cover the necessary API call and how to handle the response.

```python
# Define the document URL or path
document_url = 'https://example.com/sample.pdf'

# API request payload
payload = {
    'url': document_url,
    'extract_options': {
        'format': 'text'
    }
}

# Make the API request
response = requests.post(api_endpoint, headers=headers, json=payload)

# Handle the response
if response.status_code == 200:
    extracted_text = response.json().get('text', '')
    print('Extracted Text:', extracted_text)
else:
    print('Failed to extract text')
```
Given this information, write all the code for this section and this section only. Your output should be valid code with inline comments.
Prompt after formatting:
You are an AI that writes code for a single section of a Jupyter notebook.
Overall topic of the notebook: A Jupyter notebook focused on generating text using Langchain Extraction via the Langchain REST API with detailed code examples and explanations.
Title of the notebook section: Advanced Extraction Techniques
Description of the notebok section: Explore advanced text extraction techniques available through the Langchain REST API. This includes extracting structured data and working with different document formats.   

```python
# Advanced extraction options
extract_options = {
    'format': 'structured',
    'elements': ['tables', 'figures', 'references']
}

# Update payload with advanced options
payload['extract_options'] = extract_options

# Make the advanced extraction request
response = requests.post(api_endpoint, headers=headers, json=payload)

# Handle the response
if response.status_code == 200:
    extracted_data = response.json()
    print('Extracted Structured Data:', extracted_data)
else:
    handle_api_errors(response)
```
Given this information, write all the code for this section and this section only. Your output should be valid code with inline comments.

Prompt after formatting:
You are an AI that writes code for a single section of a Jupyter notebook.
Overall topic of the notebook: A Jupyter notebook focused on generating text using Langchain Extraction via the Langchain REST API with detailed code examples and explanations.
Title of the notebook section: Introduction to Langchain Extraction
Description of the notebok section: This section provides a brief introduction to Langchain and its text extraction capabilities. We will also cover the basics of interacting with the Langchain REST API.       

```python
# Initialize the Langchain client
client = LangchainClient(api_key='your_api_key')

# Basic API endpoint information
api_endpoint = 'https://api.langchain.com/extract'
```
Given this information, write all the code for this section and this section only. Your output should be valid code with inline comments.

> Finished chain.

> Finished chain.

> Entering new CodeImproverChain chain...
Prompt after formatting:
Improve the following code and make sure it is valid. Make sure to return the improved code only - don't give an explanation of the improvements.
```python
# Install necessary libraries
!pip install requests langchain

# Import libraries
import requests
from langchain import LangchainClient

# Set up the Langchain API endpoint
API_URL = "https://api.langchain.com/extract"

# Instantiate the Langchain client (assuming you have an API key)
# Note: Replace 'your_api_key_here' with your actual API key
api_key = "your_api_key_here"
langchain_client = LangchainClient(api_key, base_url=API_URL)

# Verify the client setup by printing a success message
if langchain_client:
    print("Langchain client successfully set up!")
else:
    print("Failed to set up Langchain client.")
```

> Finished chain.

> Finished chain.

> Entering new CodeImproverChain chain...
Prompt after formatting:
Improve the following code and make sure it is valid. Make sure to return the improved code only - don't give an explanation of the improvements.
```python
# Section: Authenticating with Langchain REST API

# Import necessary libraries
import requests

# Define your API key
# Note: Replace 'your_api_key' with your actual Langchain API key
api_key = 'your_api_key'

# Define the API endpoint for authentication verification
# Note: Replace 'api_endpoint' with the actual authentication endpoint provided by Langchain
api_endpoint = 'https://api.langchain.com/auth/verify'

# Set up headers for authentication
headers = {
    'Authorization': f'Bearer {api_key}',  # Insert the API key into the headers
    'Content-Type': 'application/json'     # Specify the content type
}

# Verify authentication by making a GET request to the API endpoint
response = requests.get(api_endpoint, headers=headers)

# Check the status code to determine if authentication was successful
if response.status_code == 200:
    print('Authentication successful')
else:
    print('Authentication failed')

# Output the response for additional debugging (optional)
print('Response Status Code:', response.status_code)
print('Response Body:', response.text)
```

This code snippet demonstrates how to authenticate with the Langchain REST API using your API key. Replace the placeholder values with your actual API key and the correct API endpoint to verify the authentication. The code sends a GET request to the specified endpoint and checks the status code to confirm whether the authentication was successful.

> Finished chain.

> Entering new CodeImproverChain chain...
Prompt after formatting:
Improve the following code and make sure it is valid. Make sure to return the improved code only - don't give an explanation of the improvements.
```python
# Import necessary libraries
import requests

# Define the document URL or path
document_url = 'https://example.com/sample.pdf'

# Define the API endpoint and headers (replace with actual values)
api_endpoint = 'https://api.langchain.com/extract'
headers = {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
}

# API request payload
payload = {
    'url': document_url,
    'extract_options': {
        'format': 'text'  # Specify the format of the extracted content
    }
}

# Make the API request
response = requests.post(api_endpoint, headers=headers, json=payload)

# Handle the response
if response.status_code == 200:
    # Parse the JSON response to get the extracted text
    extracted_text = response.json().get('text', '')
    print('Extracted Text:', extracted_text)
else:
    # Print an error message if the request failed
    print('Failed to extract text:', response.status_code, response.text)
```

> Finished chain.

> Entering new CodeImproverChain chain...
Prompt after formatting:
Improve the following code and make sure it is valid. Make sure to return the improved code only - don't give an explanation of the improvements.
```python
# Advanced Extraction Techniques

# In this section, we will explore advanced text extraction techniques using the Langchain REST API.
# These techniques include extracting structured data and working with various document formats.

import requests

# Define the API endpoint and necessary headers
api_endpoint = 'https://api.langchain.com/extract'
headers = {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer YOUR_API_KEY'  # Replace with your actual API key
}

# Sample payload with the document to be processed
payload = {
    'document': 'Your document content here'
}

# Advanced extraction options
# Here, we specify that we want the data in a structured format and are interested in extracting tables, figures, and references.
extract_options = {
    'format': 'structured',  # Specify the format as 'structured'
    'elements': ['tables', 'figures', 'references']  # Define the elements to extract
}

# Update the payload with advanced extraction options
payload['extract_options'] = extract_options

# Make the advanced extraction request to the Langchain API
response = requests.post(api_endpoint, headers=headers, json=payload)

# Handle the response from the API
if response.status_code == 200:
    extracted_data = response.json()  # Parse the JSON response
    print('Extracted Structured Data:', extracted_data)  # Display the extracted structured data
else:
    # Define a function to handle potential API errors
    def handle_api_errors(response):
        # Print the status code and error message
        print(f"Error {response.status_code}: {response.text}")

    # Call the error handling function
    handle_api_errors(response)
```

> Finished chain.

> Finished chain.

> Entering new CodeImproverChain chain...
Prompt after formatting:
Improve the following code and make sure it is valid. Make sure to return the improved code only - don't give an explanation of the improvements.
```python
# Introduction to Langchain Extraction

# In this section, we will provide an overview of Langchain, a powerful tool for text extraction.
# Langchain offers a REST API that allows users to extract meaningful information from text data.
# We will cover the basics of how to interact with the Langchain API.

# Import necessary libraries
import requests
import json

# Define the Langchain client class
class LangchainClient:
    def __init__(self, api_key):
        self.api_key = api_key
        self.headers = {
            'Authorization': f'Bearer {self.api_key}',
            'Content-Type': 'application/json'
        }

    def extract(self, text):
        # Basic API endpoint information
        api_endpoint = 'https://api.langchain.com/extract'

        # Prepare the payload with the text data
        payload = json.dumps({
            'text': text
        })

        # Make a POST request to the Langchain API
        response = requests.post(api_endpoint, headers=self.headers, data=payload)

        # Handle the response
        if response.status_code == 200:
            return response.json()
        else:
            return {"error": response.status_code, "message": response.text}

# Initialize the Langchain client with your API key
client = LangchainClient(api_key='your_api_key')

# Example text to extract information from
example_text = "Langchain is a powerful tool for extracting meaningful information from text data."

# Use the client to extract information from the example text
extraction_result = client.extract(example_text)

# Print the extraction result
print("Extraction Result:", extraction_result)
```

In this code, we have defined a `LangchainClient` class that helps to interact with the Langchain REST API. The `extract` method sends a POST request to the API endpoint with the given text and returns the extracted information. We also provide an example demonstrating how to use the client to extract information from a sample text.

> Finished chain.

> Finished chain.

> Finished chain.

> Entering new CodeImproverChain chain...
Prompt after formatting:
Improve the following code and make sure it is valid. Make sure to return the improved code only - don't give an explanation of the improvements.
```python
# Handling API Errors
# This section demonstrates how to handle errors gracefully when interacting with the Langchain REST API.
# We will cover common error scenarios and show you how to manage them effectively in your code.

import requests

# Example function to handle API errors
def handle_api_errors(response):
    """
    Handle errors based on the HTTP status code of the API response.

    Parameters:
    response (requests.Response): The response object returned by the requests library.

    Returns:
    None
    """
    if response.status_code == 400:
        print('Bad Request: ', response.json().get('message', 'Unknown error'))
    elif response.status_code == 401:
        print('Unauthorized: Invalid API key')
    elif response.status_code == 403:
        print('Forbidden: Access denied')
    elif response.status_code == 404:
        print('Not Found: Invalid endpoint or resource')
    elif response.status_code == 500:
        print('Server Error: Try again later')
    else:
        print('Unexpected Error: ', response.status_code, response.text)

# Example usage of the error handling function
api_endpoint = "https://api.langchain.com/generate-text"  # Placeholder API endpoint
headers = {
    "Authorization": "Bearer YOUR_API_KEY",  # Replace with your actual API key
    "Content-Type": "application/json"
}
payload = {
    "input_text": "Hello, world!"  # Example payload for text generation
}

# Make a POST request to the Langchain REST API
response = requests.post(api_endpoint, headers=headers, json=payload)

# Handle any errors that occur during the API request
handle_api_errors(response)

# Check if the request was successful and print the generated text
if response.status_code == 200:
    generated_text = response.json().get('generated_text', 'No text generated')
    print('Generated Text: ', generated_text)
else:
    print('Failed to generate text due to the above error.')
```

This code defines a function `handle_api_errors` that takes the response object from an API request and prints appropriate error messages based on the HTTP status code. The example usage demonstrates how to make a POST request to the Langchain REST API, handle potential errors, and print the generated text if the request is successful.

> Finished chain.

> Entering new CodeImproverChain chain...
Prompt after formatting:
Improve the following code and make sure it is valid. Make sure to return the improved code only - don't give an explanation of the improvements.
```python
# Integrating with Other Tools

# After extracting text and data using the Langchain REST API, we can integrate this data with other tools and workflows.
# This section demonstrates how to export the extracted text to different formats and use it in further processing.

# Example: Exporting extracted text to a file
def export_text_to_file(text, filename='extracted_text.txt'):
    """Exports the extracted text to a specified file."""
    with open(filename, 'w') as file:
        file.write(text)
    print(f'Text successfully exported to {filename}')

# Assuming `extracted_text` contains the text obtained from Langchain API
extracted_text = "This is a sample extracted text from Langchain API."
export_text_to_file(extracted_text)

# Example: Exporting extracted data to a JSON file
import json

def export_data_to_json(data, filename='extracted_data.json'):
    """Exports the extracted data to a JSON file."""
    with open(filename, 'w') as file:
        json.dump(data, file, indent=4)
    print(f'Data successfully exported to {filename}')

# Assuming `extracted_data` contains the structured data obtained from Langchain API
extracted_data = {
    "tables": [
        {
            "headers": ["Column1", "Column2"],
            "data": [
                ["Row1Value1", "Row1Value2"],
                ["Row2Value1", "Row2Value2"]
            ]
        }
    ]
}
export_data_to_json(extracted_data)

# Example: Using extracted data in further processing
import pandas as pd

def convert_tables_to_dataframes(data):
    """Converts structured data tables to pandas DataFrames and prints them."""
    if 'tables' in data:
        tables = data['tables']
        for i, table in enumerate(tables):
            df = pd.DataFrame(table['data'], columns=table['headers'])
            print(f"Table {i+1}:")
            print(df)
    else:
        print("No tables found in the data.")

# Convert and display the structured data as DataFrames
convert_tables_to_dataframes(extracted_data)
```

This section provides functions and examples for exporting the extracted text and data to files and converting structured data tables into pandas DataFrames for further processing. The functions are modular and reusable, facilitating easy integration into broader workflows.

> Finished chain.

> Finished chain.

> Finished chain.

> Finished chain.
[I 2024-09-10 12:15:40.701 ServerApp] /generate chat handler resolved in 25376 ms.
usage: jupyter-lab [-h]
jupyter-lab: error: unrecognized arguments: Langchain REST API Langchain Extraction

> Entering new NotebookOutlineChain chain...
Prompt after formatting:
You are an AI that creates a detailed content outline for a Jupyter notebook on a given topic.
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"description": {"title": "Description", "type": "string"}, "sections": {"title": "Sections", "type": "array", "items": {"$ref": "#/definitions/OutlineSection"}}}, "required": ["sections"], "definitions": {"OutlineSection": {"title": "OutlineSection", "type": "object", "properties": {"title": {"title": "Title", "type": "string"}, "content": {"title": "Content", "type": "string"}}, "required": ["title", "content"]}}}
```
Here is a description of the notebook you will create an outline for: /generate Langchain REST API Langchain Extraction
Don't include an introduction or conclusion section in the outline, focus only on description and sections that will need code.

> Finished chain.

> Entering new NotebookTitleChain chain...

> Entering new NotebookSummaryChain chain...

> Entering new NotebookSectionCodeChain chain...

> Entering new NotebookSectionCodeChain chain...

> Entering new NotebookSectionCodeChain chain...

> Entering new NotebookSectionCodeChain chain...

> Entering new NotebookSectionCodeChain chain...

Prompt after formatting:
Create a short, few word, descriptive title for a Jupyter notebook with the following content.
Content:
{'description': 'This Jupyter notebook covers how to use Langchain to extract data from a REST API. It includes sections on setting up the environment, making API requests, processing the API response, and extracting relevant information using Langchain.', 'sections': [{'title': 'Environment Setup', 'content': 'In this section, we will install and import the necessary libraries for making REST API requests and using Langchain.'}, {'title': 'Making REST API Requests', 'content': "This section covers how to construct and send a REST API request using Python. We will use the 'requests' library to interact with the API."}, {'title': 'Processing API Response', 'content': 'Here, we will process the JSON response from the API. This includes parsing the JSON data and handling potential errors.'}, {'title': 'Introduction to Langchain', 'content': 'This section provides an overview of Langchain, its features, and how it can be used for data extraction from text.'}, {'title': 'Using Langchain for Data Extraction', 'content': 'In this section, we will demonstrate how to use Langchain to extract relevant information from the API response. This includes setting up Langchain and writing extraction rules.'}, {'title': 'Example Use Case', 'content': 'Here, we will provide a real-world example of using Langchain to extract specific data points from an API response. This will include a step-by-step walkthrough.'}], 'prompt': '/generate Langchain REST API Langchain Extraction'}
Don't return anything other than the title.

> Entering new NotebookSectionCodeChain chain...
Prompt after formatting:
You are an AI that writes code for a single section of a Jupyter notebook.
Overall topic of the notebook: This Jupyter notebook covers how to use Langchain to extract data from a REST API. It includes sections on setting up the environment, making API requests, processing the API response, and extracting relevant information using Langchain.
Title of the notebook section: Making REST API Requests
Description of the notebok section: This section covers how to construct and send a REST API request using Python. We will use the 'requests' library to interact with the API.
Given this information, write all the code for this section and this section only. Your output should be valid code with inline comments.

Prompt after formatting:
You are an AI that writes code for a single section of a Jupyter notebook.
Overall topic of the notebook: This Jupyter notebook covers how to use Langchain to extract data from a REST API. It includes sections on setting up the environment, making API requests, processing the API response, and extracting relevant information using Langchain.
Title of the notebook section: Example Use Case
Description of the notebok section: Here, we will provide a real-world example of using Langchain to extract specific data points from an API response. This will include a step-by-step walkthrough.
Given this information, write all the code for this section and this section only. Your output should be valid code with inline comments.
Prompt after formatting:
You are an AI that writes code for a single section of a Jupyter notebook.
Overall topic of the notebook: This Jupyter notebook covers how to use Langchain to extract data from a REST API. It includes sections on setting up the environment, making API requests, processing the API response, and extracting relevant information using Langchain.
Title of the notebook section: Processing API Response
Description of the notebok section: Here, we will process the JSON response from the API. This includes parsing the JSON data and handling potential errors.
Given this information, write all the code for this section and this section only. Your output should be valid code with inline comments.

Prompt after formatting:
You are an AI that writes code for a single section of a Jupyter notebook.
Overall topic of the notebook: This Jupyter notebook covers how to use Langchain to extract data from a REST API. It includes sections on setting up the environment, making API requests, processing the API response, and extracting relevant information using Langchain.
Title of the notebook section: Using Langchain for Data Extraction
Description of the notebok section: In this section, we will demonstrate how to use Langchain to extract relevant information from the API response. This includes setting up Langchain and writing extraction rules.
Given this information, write all the code for this section and this section only. Your output should be valid code with inline comments.

Prompt after formatting:
You are an AI that writes code for a single section of a Jupyter notebook.
Overall topic of the notebook: This Jupyter notebook covers how to use Langchain to extract data from a REST API. It includes sections on setting up the environment, making API requests, processing the API response, and extracting relevant information using Langchain.
Title of the notebook section: Environment Setup
Description of the notebok section: In this section, we will install and import the necessary libraries for making REST API requests and using Langchain.
Given this information, write all the code for this section and this section only. Your output should be valid code with inline comments.
Prompt after formatting:
Create a markdown summary for a Jupyter notebook with the following content. The summary should consist of a single paragraph.
Content:
{'description': 'This Jupyter notebook covers how to use Langchain to extract data from a REST API. It includes sections on setting up the environment, making API requests, processing the API response, and extracting relevant information using Langchain.', 'sections': [{'title': 'Environment Setup', 'content': 'In this section, we will install and import the necessary libraries for making REST API requests and using Langchain.'}, {'title': 'Making REST API Requests', 'content': "This section covers how to construct and send a REST API request using Python. We will use the 'requests' library to interact with the API."}, {'title': 'Processing API Response', 'content': 'Here, we will process the JSON response from the API. This includes parsing the JSON data and handling potential errors.'}, {'title': 'Introduction to Langchain', 'content': 'This section provides an overview of Langchain, its features, and how it can be used for data extraction from text.'}, {'title': 'Using Langchain for Data Extraction', 'content': 'In this section, we will demonstrate how to use Langchain to extract relevant information from the API response. This includes setting up Langchain and writing extraction rules.'}, {'title': 'Example Use Case', 'content': 'Here, we will provide a real-world example of using Langchain to extract specific data points from an API response. This will include a step-by-step walkthrough.'}], 'prompt': '/generate Langchain REST API Langchain Extraction'}

Prompt after formatting:
You are an AI that writes code for a single section of a Jupyter notebook.
Overall topic of the notebook: This Jupyter notebook covers how to use Langchain to extract data from a REST API. It includes sections on setting up the environment, making API requests, processing the API response, and extracting relevant information using Langchain.
Title of the notebook section: Introduction to Langchain
Description of the notebok section: This section provides an overview of Langchain, its features, and how it can be used for data extraction from text.
Given this information, write all the code for this section and this section only. Your output should be valid code with inline comments.

> Finished chain.

> Finished chain.

> Entering new CodeImproverChain chain...
Prompt after formatting:
Improve the following code and make sure it is valid. Make sure to return the improved code only - don't give an explanation of the improvements.
```python
# Import the requests library
import requests

# Define the API endpoint
api_url = "https://api.example.com/data"

# Define the headers, if needed (e.g., for authentication)
headers = {
    "Authorization": "Bearer YOUR_ACCESS_TOKEN",
    "Content-Type": "application/json"
}

# Define any parameters for the API request
params = {
    "param1": "value1",
    "param2": "value2"
}

# Make the GET request to the API
response = requests.get(api_url, headers=headers, params=params)

# Check if the request was successful
if response.status_code == 200:
    # Parse the JSON response
    data = response.json()
    # Print the JSON data to inspect
    print(data)
else:
    # Print an error message if the request failed
    print(f"Request failed with status code: {response.status_code}")
    print(f"Response: {response.text}")
```

> Finished chain.

> Finished chain.

> Entering new CodeImproverChain chain...
Prompt after formatting:
Improve the following code and make sure it is valid. Make sure to return the improved code only - don't give an explanation of the improvements.
```python
# Environment Setup

# Install necessary libraries
# !pip install requests
# !pip install langchain

# Import necessary libraries
import requests  # Library for making HTTP requests
from langchain import Langchain  # Library for extracting information

# Verify the installations by checking the versions
import pkg_resources

requests_version = pkg_resources.get_distribution("requests").version
langchain_version = pkg_resources.get_distribution("langchain").version

print(f"Requests version: {requests_version}")
print(f"Langchain version: {langchain_version}")
```

> Finished chain.

> Finished chain.

> Finished chain.

> Entering new CodeImproverChain chain...
Prompt after formatting:
Improve the following code and make sure it is valid. Make sure to return the improved code only - don't give an explanation of the improvements.
```python
# Example Use Case: Extracting Data from a REST API using Langchain

# Step 1: Import necessary libraries
import requests
from langchain import Langchain

# Step 2: Set up the API endpoint and parameters
api_url = "https://api.example.com/data"
params = {
    "param1": "value1",
    "param2": "value2"
}

# Step 3: Make the API request
response = requests.get(api_url, params=params)

# Step 4: Check if request was successful
if response.status_code == 200:
    print("API request successful.")
else:
    print(f"API request failed with status code: {response.status_code}")

# Step 5: Process the API response
api_data = response.json()  # Convert the response to a JSON object

# Step 6: Initialize Langchain
langchain = Langchain()

# Step 7: Define the data extraction logic using Langchain
# For this example, let's assume we want to extract 'name' and 'value' fields from the response
extraction_template = """
{
  "name": "{{ name }}",
  "value": "{{ value }}"
}
"""

# Step 8: Use Langchain to extract data points
extracted_data = langchain.extract(extraction_template, api_data)

# Step 9: Display extracted data
print("Extracted Data:")
print(extracted_data)
```

This code provides a step-by-step walkthrough of a real-world example using Langchain to extract specific data points from an API response. It starts by setting up the API endpoint and making a request, then processes the response and uses Langchain to extract the desired information.

> Finished chain.

> Entering new CodeImproverChain chain...
Prompt after formatting:
Improve the following code and make sure it is valid. Make sure to return the improved code only - don't give an explanation of the improvements.
```python
# Using Langchain for Data Extraction

# In this section, we will demonstrate how to use Langchain to extract relevant information from the API response.
# This includes setting up Langchain and writing extraction rules.

# First, let's install Langchain if it's not already installed.
!pip install langchain

# Import necessary libraries
from langchain import Chain, Rule, Field

# Sample API response for demonstration purposes
api_response = {
    "data": [
        {"id": 1, "name": "John Doe", "email": "john.doe@example.com"},
        {"id": 2, "name": "Jane Smith", "email": "jane.smith@example.com"},
    ],
    "status": "success",
    "timestamp": "2023-10-01T12:34:56Z"
}

# Define the extraction rules
rules = [
    # Rule to extract 'id' field from each data entry
    Rule(
        path="data[*].id",
        field=Field(name="id", type=int)
    ),
    # Rule to extract 'name' field from each data entry
    Rule(
        path="data[*].name",
        field=Field(name="name", type=str)
    ),
    # Rule to extract 'email' field from each data entry
    Rule(
        path="data[*].email",
        field=Field(name="email", type=str)
    )
]

# Initialize the Langchain with the defined rules
chain = Chain(rules=rules)

# Process the API response to extract relevant information
extracted_data = chain.extract(api_response)

# Display the extracted data
print(extracted_data)

# Expected output:
# [
#     {"id": 1, "name": "John Doe", "email": "john.doe@example.com"},
#     {"id": 2, "name": "Jane Smith", "email": "jane.smith@example.com"}
# ]
```

This code snippet demonstrates how to set up Langchain to extract specific data fields from an API response. It installs necessary packages, defines extraction rules, initializes the Langchain with these rules, and processes a sample API response to extract and display relevant information.

> Finished chain.

> Entering new CodeImproverChain chain...
Prompt after formatting:
Improve the following code and make sure it is valid. Make sure to return the improved code only - don't give an explanation of the improvements.
```python
# Import necessary libraries
import json

# Sample JSON response from the API
api_response = '''
{
    "status": "success",
    "data": {
        "id": 123,
        "name": "John Doe",
        "email": "john.doe@example.com",
        "details": {
            "age": 30,
            "location": "New York"
        }
    },
    "message": "Data fetched successfully"
}
'''

# Function to process the API response
def process_api_response(response):
    try:
        # Parse the JSON response
        data = json.loads(response)

        # Check if the response contains the expected data
        if data['status'] == 'success':
            user_data = data['data']

            # Extract relevant information
            user_id = user_data['id']
            user_name = user_data['name']
            user_email = user_data['email']
            user_age = user_data['details']['age']
            user_location = user_data['details']['location']

            # Print extracted information
            print(f"User ID: {user_id}")
            print(f"Name: {user_name}")
            print(f"Email: {user_email}")
            print(f"Age: {user_age}")
            print(f"Location: {user_location}")
        else:
            # Handle case where status is not success
            print(f"Error: {data['message']}")
    except json.JSONDecodeError:
        # Handle JSON decoding error
        print("Failed to decode JSON response")
    except KeyError as e:
        # Handle missing keys in the response
        print(f"Missing key in the response: {e}")

# Process the sample API response
process_api_response(api_response)
```

In this section, we have defined a function `process_api_response` to handle the JSON response from the API. The function includes error handling for JSON decoding issues and missing keys in the response. The relevant information is extracted and printed out.

> Finished chain.

> Entering new CodeImproverChain chain...
Prompt after formatting:
Improve the following code and make sure it is valid. Make sure to return the improved code only - don't give an explanation of the improvements.
```python
# Introduction to Langchain

# In this section, we will provide an overview of Langchain and its features.
# Langchain is a powerful library designed to simplify the process of extracting
# data from text. It offers various tools and functionalities that can be 
# leveraged to parse, analyze, and extract relevant information effectively.

# First, let's ensure that we have Langchain installed. 
# If you haven't installed it yet, you can do so using pip.

# !pip install langchain

# Import necessary modules from Langchain
from langchain import TextExtractor, TextParser

# Langchain's TextExtractor is used to extract textual content from various sources.
# Here, we will create an instance of TextExtractor.
text_extractor = TextExtractor()

# TextParser is another crucial component of Langchain. It helps in parsing 
# the extracted text to identify and extract relevant information.
text_parser = TextParser()

# Let's demonstrate a simple example of how Langchain can be used to extract data from a text.

# Sample text
sample_text = """
Langchain is an advanced library designed for data extraction from text.
It simplifies the process of parsing, analyzing, and extracting relevant information.
With features like TextExtractor and TextParser, Langchain makes it easy to handle
large volumes of text data efficiently.
"""

# Use the text_extractor to extract text content (this simulates reading from a source)
extracted_text = text_extractor.extract_from_string(sample_text)

# Use the text_parser to parse the extracted text and identify key information
parsed_data = text_parser.parse(extracted_text)

# Output the extracted and parsed data
print("Extracted Text:")
print(extracted_text)
print("\nParsed Data:")
print(parsed_data)

# In the following sections, we will delve deeper into making API requests,
# processing API responses, and using Langchain to extract relevant information.
```

This code provides an overview of Langchain, its features, and a simple example of how to use its `TextExtractor` and `TextParser` to extract and parse data from text. This serves as an introductory section to get users familiarized with the basics of Langchain before moving on to more advanced topics in the subsequent sections.

> Finished chain.

> Finished chain.

> Finished chain.

> Finished chain.
[I 2024-09-10 12:16:57.263 ServerApp] /generate chat handler resolved in 15125 ms.
[W 2024-09-10 12:17:34.400 ServerApp] Notebook Extracting Data from REST APIs with Langchain in Python.ipynb is not trusted
[I 2024-09-10 12:17:36.240 ServerApp] Kernel started: f7b6a62f-0df1-4f72-baa2-8b19b0335d6f
[I 2024-09-10 12:17:37.284 ServerApp] Connecting to kernel f7b6a62f-0df1-4f72-baa2-8b19b0335d6f.
[I 2024-09-10 12:18:40.876 ServerApp] Saving file at /Extracting Data from REST APIs with Langchain in Python.ipynb
Browser Output
Paste the output from your browser Javascript console here, if applicable.

krassowski commented 1 month ago

A change would be needed here:

https://github.com/jupyterlab/jupyter-ai/blob/43e6acce6f03c20d2fd889e229103e3b4dcb9003/packages/jupyter-ai/jupyter_ai/chat_handlers/generate.py#L260

Ensuring a file name is valid cross-platform is not trivial: https://stackoverflow.com/questions/295135/turn-a-string-into-a-valid-filename but at least removing colons seems like a good idea (as these can be also used for Jupyter drives).

JasonWeill commented 1 month ago

I think removing colons, slashes, and backslashes would be a good start. Restricting filenames to just ASCII alphanumerics might be good, with some kind of fallback logic if the title has only non-ASCII characters (e.g., if it's in Chinese or Japanese).

richlysakowski commented 1 month ago

I have the same problems with the "/generate" command.

All code content is generated in memory successfully, but fails to save to the file system. For complete tutorials and long code documents, it generates everything, but the save-to-file operation fails.

 "I think removing colons, slashes, and backslashes would be a good

start."

I removed all colons, slashes, backslashes, commas, and all other punctuation. None of that works for me. The longer the prompt the more often it fails to save the generated content. I removed all punctuation and shortened the prompts to trivial lengths. Certain words seem to trigger the insertion of COLON characters in the filename, like "tutorial", "lessons", "detailed explanations", and other longish prompts.

MISSING DESIGN REQUIREMENT: Give the user the option to EXPLICITLY specify the output file name, and not let AI LLMs do that.

I tried hacking the code generate.py" from the chat_handlers folder for JupyterLab-AI in the site-packages directory,

" C:\ProgramData\Anaconda3\envs\JupyterLabAI_Tools\Lib\site-packages\jupyter_ai\chat_handlers\generate.py "

I added this function to the beginning of the file:

import re

Function to sanitize filenames

def sanitize_filename(filename):

Replace invalid characters for Windows, MacOS, and Linux

return re.sub(r'[\\/:*?"<>|]', '_', filename)

and then added the following code (in blue) to the _generate_notebook() function in generate.py.

async def _generate_notebook(self, prompt: str):
    """Generate a notebook and save to local disk"""

    # create outline
    outline = await generate_outline(prompt, llm=self.llm, verbose=True)
    # Save the user input prompt, the description property is now LLM

generated. outline["prompt"] = prompt

    if self.llm.allows_concurrency:
        # fill the outline concurrently
        await afill_outline(outline, llm=self.llm, verbose=True)
    else:
        # fill outline
        await fill_outline(outline, llm=self.llm, verbose=True)

    # create and write the notebook to disk
    notebook = create_notebook(outline)
    final_path = os.path.join(self.output_dir,

sanitize_filename(outline["title"])[:250] + ".ipynb") try:

Truncate the sanitized title if it's too long

sanitized_title = sanitize_filename(outline["title"])[:250]
final_path = os.path.join(self.output_dir, sanitized_title + ".ipynb")
nbformat.write(notebook, final_path)

except Exception as e: print(f"Failed to save the notebook to {final_path}: {str(e)}") raise

The code change did not fix the problem. I restarted the Jupyter Server to make sure the change was picked up from the back-end component for JupyterLabAI.

But now I get a security error. I have not set up a full development project. I simply added the code to convert non-valid characters in a Windows, MacOS, or Linux filepath. I am on Windows so I am most concerned about getting it working for Windows right now, but it needs to be done right for all platforms, not just a hack for one.

[W 2024-09-10 13:16:50.411 ServerApp] 404 GET /api/ai/chats?token=[secret] (5cc00c5dc7d04d669b976d2f569697c5@::1) 2.04ms referer=None [W 2024-09-10 13:16:51.164 ServerApp] 404 GET /api/ai/completion/inline?token=[secret] (5cc00c5dc7d04d669b976d2f569697c5@::1) 2.00ms referer=None

When I have time later to analyze this more fully, and fix it, I will submit a pull request. If someone has time to test this with a full development environment, I would appreciate it.

Thank you.

Best Regards,

Rich Lysakowski, Ph.D. Data Scientist, AI & Analytics Engineer, and Senior Business Systems Analyst 781-640-2048 mobile

On Tue, Sep 10, 2024 at 5:24 PM Jason Weill @.***> wrote:

I think removing colons, slashes, and backslashes would be a good start. Restricting filenames to just ASCII alphanumerics might be good, with some kind of fallback logic if the title has only non-ASCII characters (e.g., if it's in Chinese or Japanese).

— Reply to this email directly, view it on GitHub https://github.com/jupyterlab/jupyter-ai/issues/990#issuecomment-2342036033, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACHJVLZ4ZEB6BFDSAUTMAJLZV5PRFAVCNFSM6AAAAABN7FYN5CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBSGAZTMMBTGM . You are receiving this because you are subscribed to this thread.

import asyncio import os import time import traceback from pathlib import Path

import re

import os

Function to sanitize filenames

def sanitize_filename(filename):

Replace invalid characters for Windows, MacOS, and Linux

return re.sub(r'[\\/:*?"<>|]', '_', filename)

from typing import Dict, List, Optional, Type

import nbformat from jupyter_ai.chat_handlers import BaseChatHandler, SlashCommandRoutingType from jupyter_ai.models import HumanChatMessage from jupyter_ai_magics.providers import BaseProvider from langchain.chains import LLMChain from langchain.llms import BaseLLM from langchain.output_parsers import PydanticOutputParser from langchain.pydantic_v1 import BaseModel from langchain.schema.output_parser import BaseOutputParser from langchain_core.prompts import PromptTemplate

class OutlineSection(BaseModel): title: str content: str

class Outline(BaseModel): description: Optional[str] = None sections: List[OutlineSection]

class NotebookOutlineChain(LLMChain): """Chain to generate a notebook outline, with section titles and descriptions."""

@classmethod
def from_llm(
    cls, llm: BaseLLM, parser: BaseOutputParser[Outline], verbose: bool

= False ) -> LLMChain: task_creation_template = ( "You are an AI that creates a detailed content outline for a Jupyter notebook on a given topic.\n" "{format_instructions}\n" "Here is a description of the notebook you will create an outline for: {description}\n" "Don't include an introduction or conclusion section in the outline, focus only on description and sections that will need code.\n" ) prompt = PromptTemplate( template=task_creation_template, input_variables=["description"], partial_variables={"format_instructions": parser.get_format_instructions()}, ) return cls(prompt=prompt, llm=llm, verbose=verbose)

async def generate_outline(description, llm=None, verbose=False): """Generate an outline of sections given a description of a notebook.""" parser = PydanticOutputParser(pydantic_object=Outline) chain = NotebookOutlineChain.from_llm(llm=llm, parser=parser, verbose=verbose) outline = await chain.apredict(description=description) outline = parser.parse(outline) return outline.dict()

class CodeImproverChain(LLMChain): """Chain to improve source code."""

@classmethod
def from_llm(cls, llm: BaseLLM, verbose: bool = False) -> LLMChain:
    task_creation_template = (
        "Improve the following code and make sure it is valid. Make

sure to return the improved code only - don't give an explanation of the improvements.\n" "{code}" ) prompt = PromptTemplate( template=task_creation_template, input_variables=[ "code", ], ) return cls(prompt=prompt, llm=llm, verbose=verbose)

class NotebookSectionCodeChain(LLMChain): """Chain to generate source code for a notebook section."""

@classmethod
def from_llm(cls, llm: BaseLLM, verbose: bool = False) -> LLMChain:
    task_creation_template = (
        "You are an AI that writes code for a single section of a

Jupyter notebook.\n" "Overall topic of the notebook: {description}\n" "Title of the notebook section: {title}\n" "Description of the notebok section: {content}\n" "Given this information, write all the code for this section and this section only." " Your output should be valid code with inline comments.\n" ) prompt = PromptTemplate( template=task_creation_template, input_variables=["description", "title", "content"], ) return cls(prompt=prompt, llm=llm, verbose=verbose)

class NotebookSummaryChain(LLMChain): """Chain to generate a short summary of a notebook."""

@classmethod
def from_llm(cls, llm: BaseLLM, verbose: bool = False) -> LLMChain:
    task_creation_template = (
        "Create a markdown summary for a Jupyter notebook with the

following content." " The summary should consist of a single paragraph.\n" "Content:\n{content}" ) prompt = PromptTemplate( template=task_creation_template, input_variables=[ "content", ], ) return cls(prompt=prompt, llm=llm, verbose=verbose)

class NotebookTitleChain(LLMChain): """Chain to generate the title of a notebook."""

@classmethod
def from_llm(cls, llm: BaseLLM, verbose: bool = False) -> LLMChain:
    task_creation_template = (
        "Create a short, few word, descriptive title for a Jupyter

notebook with the following content.\n" "Content:\n{content}\n" "Don't return anything other than the title." ) prompt = PromptTemplate( template=task_creation_template, input_variables=[ "content", ], ) return cls(prompt=prompt, llm=llm, verbose=verbose)

async def improve_code(code, llm=None, verbose=False): """Improve source code using an LLM.""" chain = CodeImproverChain.from_llm(llm=llm, verbose=verbose) improved_code = await chain.apredict(code=code) improved_code = "\n".join( [line for line in improved_code.split("\n") if not line.startswith("```")] ) return improved_code

async def generate_code(section, description, llm=None, verbose=False) -> None: """ Function that accepts a section and adds code under the "code" key when awaited. """ chain = NotebookSectionCodeChain.from_llm(llm=llm, verbose=verbose) code = await chain.apredict( description=description, title=section["title"], content=section["content"], ) improved_code = await improve_code(code, llm=llm, verbose=verbose) section["code"] = improved_code

async def generate_title(outline, llm=None, verbose: bool = False): """Generate a title of a notebook outline using an LLM.""" title_chain = NotebookTitleChain.from_llm(llm=llm, verbose=verbose) title = await title_chain.apredict(content=outline) title = title.strip() title = title.strip("'\"") outline["title"] = title

async def generate_summary(outline, llm=None, verbose: bool = False): """Generate a summary of a notebook using an LLM.""" summary_chain = NotebookSummaryChain.from_llm(llm=llm, verbose=verbose) summary = await summary_chain.apredict(content=outline) outline["summary"] = summary

async def fill_outline(outline, llm, verbose=False): """Generate title and content of a notebook sections using an LLM.""" shared_kwargs = {"outline": outline, "llm": llm, "verbose": verbose}

await generate_title(**shared_kwargs)
await generate_summary(**shared_kwargs)
for section in outline["sections"]:
    await generate_code(section, outline["description"], llm=llm,

verbose=verbose)

async def afill_outline(outline, llm, verbose=False): """Concurrently generate title and content of notebook sections using an LLM.""" shared_kwargs = {"outline": outline, "llm": llm, "verbose": verbose}

all_coros = []
all_coros.append(generate_title(**shared_kwargs))
all_coros.append(generate_summary(**shared_kwargs))
for section in outline["sections"]:
    all_coros.append(
        generate_code(section, outline["description"], llm=llm,

verbose=verbose) ) await asyncio.gather(*all_coros)

def create_notebook(outline): """Create an nbformat Notebook object for a notebook outline.""" nbf = nbformat.v4 nb = nbf.new_notebook() nb["cells"].append(nbf.new_markdown_cell("# " + outline["title"])) nb["cells"].append(nbf.new_markdown_cell("## Introduction")) disclaimer = f"This notebook was created by Jupyter AI with the following prompt:\n\n> {outline['prompt']}" nb["cells"].append(nbf.new_markdown_cell(disclaimer)) nb["cells"].append(nbf.new_markdown_cell(outline["summary"]))

for section in outline["sections"][1:]:
    nb["cells"].append(nbf.new_markdown_cell("## " + section["title"]))
    for code_block in section["code"].split("\n\n"):
        nb["cells"].append(nbf.new_code_cell(code_block))
return nb

class GenerateChatHandler(BaseChatHandler): id = "generate" name = "Generate Notebook" help = "Generate a Jupyter notebook from a text prompt" routing_type = SlashCommandRoutingType(slash_id="generate")

uses_llm = True

def __init__(self, log_dir: Optional[str], *args, **kwargs):
    super().__init__(*args, **kwargs)
    self.log_dir = Path(log_dir) if log_dir else None
    self.llm = None

def create_llm_chain(
    self, provider: Type[BaseProvider], provider_params: Dict[str, str]
):
    unified_parameters = {
        **provider_params,
        **(self.get_model_parameters(provider, provider_params)),
    }
    llm = provider(**unified_parameters)

    self.llm = llm
    return llm

async def _generate_notebook(self, prompt: str):
    """Generate a notebook and save to local disk"""

    # create outline
    outline = await generate_outline(prompt, llm=self.llm, verbose=True)
    # Save the user input prompt, the description property is now LLM

generated. outline["prompt"] = prompt

    if self.llm.allows_concurrency:
        # fill the outline concurrently
        await afill_outline(outline, llm=self.llm, verbose=True)
    else:
        # fill outline
        await fill_outline(outline, llm=self.llm, verbose=True)

    # create and write the notebook to disk
    notebook = create_notebook(outline)
    final_path = os.path.join(self.output_dir,

sanitize_filename(outline["title"])[:250] + ".ipynb") try:

Truncate the sanitized title if it's too long

sanitized_title = sanitize_filename(outline["title"])[:250]
final_path = os.path.join(self.output_dir, sanitized_title + ".ipynb")
nbformat.write(notebook, final_path)

except Exception as e: print(f"Failed to save the notebook to {final_path}: {str(e)}") raise

    return final_path

async def process_message(self, message: HumanChatMessage):
    self.get_llm_chain()

    # first send a verification message to user
    response = "👍 Great, I will get started on your notebook. It may

take a few minutes, but I will reply here when the notebook is ready. In the meantime, you can continue to ask me other questions." self.reply(response, message)

    final_path = await self._generate_notebook(prompt=message.body)
    response = f"""🎉 I have created your notebook and saved it to the

location {final_path}. I am still learning how to create notebooks, so please review all code before running it.""" self.reply(response, message)

async def handle_exc(self, e: Exception, message: HumanChatMessage):
    timestamp = time.strftime("%Y-%m-%d-%H.%M.%S")
    default_log_dir = Path(self.output_dir) / "jupyter-ai-logs"
    log_dir = self.log_dir or default_log_dir
    log_dir.mkdir(parents=True, exist_ok=True)
    log_path = log_dir / f"generate-{timestamp}.log"
    with log_path.open("w") as log:
        traceback.print_exc(file=log)

    response = f"An error occurred while generating the notebook. The

error details have been saved to ./{log_path}.\n\nTry running /generate again, as some language models require multiple attempts before a notebook is generated." self.reply(response, message)

Message ID: @.***>

richlysakowski commented 1 month ago

I think removing colons, slashes, and backslashes would be a good start. Restricting filenames to just ASCII alphanumerics might be good, with some kind of fallback logic if the title has only non-ASCII characters (e.g., if it's in Chinese or Japanese).

As a simple and deterministic fix, can we just add an option to the /generate command?

I am thinking it could be something like one of these two suggestions below:

/generate:filename /generate --filename="my_notebook_name.ipynb"

This is to let the user provide an explicit "notebook_filename" for the NB to override the AI-generated filename. The AI-generated filename is often problematic, too long, or not very accurate.

I don't know how to implement this myself yet, because I don't understand the code for the slash commands implementation yet.