Open robojeb opened 2 years ago
Hi, thanks for opening this issue. I definitely agree that this should be documented better and I'll be adding this to my TODO list (including moving to byte offsets instead of character positions) for my planned refactor of the library.
From my end, I wonder whether it would be better for me to keep Source
's internal view of things as an implementation detail: it's set up to make it as easy as possible for Ariadne's diagnostic generator to work with and isn't really intended as a way for library users to directly interact with a file's contents.
Thanks for taking a look.
It would be fine if Source
wasn't intended for use as an input type.
Having percolated on it over night, I would change the flow of my code to: Parse on a str
, convert str
to Source
and place in Cache, Print diagnostics from the Cache.
This may already be what you intended it just wasn't the first thing that came to my mind because I was looking at the ariadne::Cache
which directly returns Source
objects.
I ended up writing my own cache so I didn't have to load the files from the filesystem at both parsing and diagnostics.
I ran into the same issue just now. I do agree that it doesn't have to be Source
s job to act as input to a parser but given the state of documentation it's not unreasonable to try to use it that way.
For me, my main motivation to use Source
as input to Chumsky was that it looked like the safest bet to avoid getting bitten by changes to the newline handling, ie.:
len: line.chars().count() + 1, // TODO: Don't assume all newlines are a single character!
Now, if you can promise not to work on this TODO until switching to byte offsets (at which point this is probably moot, anyway), I'd just switch to iterate the source string directly, filtering out any '\r'
characters.
This is handled slightly better in master
, fwiw. I've not had time to sit down and do the rework of the crate I've been meaning to in order to properly address this though.
I recently switched my parsing from going over a stream of characters produced by
str::chars()
to parsing over a stream of characters fromariadne::Source::chars()
. This makes it a lot easier to load all of my source files through a cache which can be directly used byariadne
for printing reports.Unfortunately it isn't well documented that
str::char()
andariadne::Source::chars()
don't produce the same, or even roughly equivalent, output.I spent some time debugging why none of my line-ending parsers were succeeding before I realized that
ariadne::Source::chars()
doesn't produce any line-endings (or even whitespace).It would be nice if
ariadne::Source::chars()
were documented as not producing any white-space.It would also be nice if
ariadne::Source
provided another method which could get a stream of characters including line-endings with appropriate spans.I mocked up an adapter which can re-engineer a character stream which is mostly equivalent to
str::chars()
as long as your parser doesn't care about white-space. It returns all the characters in the file with their correct indices, and then also injects newline characters using the index at the end of the line span. This works as a replacement forstr::chars().enumerate()
in my parsing code.