jupyterlab / jupyter-ai

A generative AI extension for JupyterLab
https://jupyter-ai.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
3.02k stars 299 forks source link

Unable to include webpages copy and pasted into context for language models #808

Open jaanli opened 1 month ago

jaanli commented 1 month ago

Description

In this bug, it is difficult to use curl, or just copy and paste text from a website (e.g. in this case, https://uwdata.github.io/mosaic/jupyter/), for use in a prompt:

image

It looks like something is not being escaped properly.

Reproduce

  1. Go to https://github.com/jaanli/new-york-real-estate/blob/fdb4e029feeb27f80ca08b87c2637136fa3bdec3/notebooks/load_and_visualize_opencorporates_corporate_entity_data_llc_corp.ipynb
  2. Copy and paste the text from https://uwdata.github.io/mosaic/jupyter/
  3. Set the text equal to a python variable (e.g. example_raw)
  4. Include this variable in a %%ai cell magic thanks to jupyter-ai.
  5. Try to run the cell and see the error about things not being escaped.

Expected behavior

Able to copy and paste text from URLs or CURLs.

Context

Troubleshoot Output
Paste the output from running `jupyter troubleshoot` from the command line here.
You may want to sanitize the paths in the output.
Command Line Output
Paste the output from your command line running `jupyter lab` here, use `--debug` if possible.
Browser Output
Paste the output from your browser Javascript console here, if applicable.

krassowski commented 1 month ago

It looks like your example_raw variable is undefined (in your notebook). I think this should have a better error message.

krassowski commented 1 month ago

So far I was only able to break it like this:

image

but it is different from your error. Can you provide an example value for example_raw that can reproduce the problem?

jaanli commented 1 month ago

Thanks so much for checking! It should be in the reproduce section: https://github.com/jaanli/new-york-real-estate/blob/fdb4e029feeb27f80ca08b87c2637136fa3bdec3/notebooks/load_and_visualize_opencorporates_corporate_entity_data_llc_corp.ipynb

Please let me know if that doesn't work!

krassowski commented 1 month ago

I do not have access to Claude but when I try the snippet it works ok:

image

I suspect that the value you have for prompt or for example_raw variable may be important. Can you share what values do you have on these variables?