gordonwatts / snowmass-chat

Experiments exploring the US Snowmass Process documents using LLM
Apache License 2.0
2 stars 0 forks source link

Translate unicode characters #26

Open gordonwatts opened 7 months ago

gordonwatts commented 7 months ago

When extracted from PDF there are unicode characters - like "mu" or "inflaton" (the fl). We should explode those before doing any processing an dconvert them to something uniform so they can be identified.