Added extra experiments - mainly around macro chunking

Overview

Experiments added:

LongEmbed Examples against chunk size (nDCG@10 and mAP@10)
Macro chunking approach vs 'hard' boundary approach with 0 overlap
Example with Anthropics contextual retrieval

LongEmbed Examples against chunk size (nDCG@10 and mAP@10)

Similarly to run_chunked_eval.py, run_chunked_eval_with_macro_chunking.py can just be run on the command line with e.g.

python3 run_chunked_eval_with_macro_chunks.py --task-name LEMBWikimQARetrievalChunked

To reproduce easily

I recommend the bash file

#!/bin/bash

# Define arrays for names and embedding sizes
names=(LEMBWikimQARetrievalChunked LEMBQMSumRetrievalChunked LEMBNarrativeQARetrievalChunked LEMBSummScreenFDRetrievalChunked)

# Loop over each name
for name in "${names[@]}"; do
  echo $name
  python3 run_chunked_eval_with_macro_chunks.py --task-name $name
  done

to run them all at once. Then the results can be displayed graphically in a matplotlib plot via running the file plot_chunk_size_experiments.py.

Macro chunking approach vs 'hard' boundary approach with 0 overlap

Similar to the above - comparing macro chunking to non-macro chunking, with experiment file run_macro_chunking_experiments.py and plot file plot_macro_chunking_experiments.py.

Example with Anthropics contextual retrieval

You can run the explanatory_contextual_retrieval.py to see a comparison between Anthropics contextual retrieval, which manually adds context to each chunk, late chunking, and naive chunking. This is performed via a running on a generated document which deliberately has context missing in later sentences (via 'Its' instead of a company name). The comparison is via cosine similarities on the chunks and the corresponding embeddings on jina-embeddings-v2-base-en.

jina-ai / late-chunking