jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.27k stars 3.36k forks source link

Archictectural change: Make all writers useable without IO #2930

Closed jgm closed 7 years ago

jgm commented 8 years ago

Currently a number of writers are impure and do IO: docx, odt, epub, epub3, fb2, icml, rtf.

It would be nice to use a free monad instead, so that these writers could be used either outside of IO contexts. First step would be to catalog the places where IO is really used in these writers.

mb21 commented 8 years ago

I've only quickly skimmed through some posts about free monads. But if I understand correctly, you're proposing to abstract the writers: instead of doing the document conversion, they would only generate a plan for a document conversion. This plan (of type Free Pandoc r) could then be executed by either of two functions:

runIO   :: Free Pandoc r -> IO r
runPure :: Free Pandoc r -> r

where r is a tuple of warnings/errors and the output (either String, ByteString, or hopefully in the future Text).

The ICML writer for example, only needs IO when processing images. So if you know that your document doesn't contain images (or wish to ignore their dimensions), you could run runPure to actually do the conversion.

jgm commented 8 years ago

Something like that, yes. (An alternative would be to use a typeclass that can be instantiated by various monads.)

I would separate out the two issues:

  1. allowing different underlying textual types (Text, String, ByteString)
  2. allowing all writers to be used outside of IO

The main point here is (2). There might be a reason to do (1) also, but it's a separate issue.

In addition to runIO and runPure, there could be a way of running a writer where you specify the contents of images as arguments, so they don't need to be read in IO. This could be handy in some cases.

+++ Mauro Bieg [May 29 16 05:19 ]:

I've only quickly skimmed through some posts about free monads. But if I understand correctly, you're proposing to abstract the writers: instead of doing the document conversion, they would only generate a plan for a document conversion. This plan (of type Free Pandoc r) could then be executed by either of two functions: runIO :: Free Pandoc r -> IO r runPure :: Free Pandoc r -> r

where r is either String, ByteString, or hopefully in the future Text.

The ICML writer for example, only needs IO when processing images. So if you know that your document doesn't contain images (or wish to ignore their dimensions), you could run runPure to actually do the conversion.

— You are receiving this because you authored the thread. Reply to this email directly, [1]view it on GitHub, or [2]mute the thread.

References

  1. https://github.com/jgm/pandoc/issues/2930#issuecomment-222357664
  2. https://github.com/notifications/unsubscribe/AAAL5EBPVs9J8QfAybYZMjDiE7wkW1cBks5qGYQ6gaJpZM4Iiem3
jkr commented 8 years ago

The docx writer really only seems to need it for reading the default reference-docx, and for producing some random numbers for nsids. Presumably the latter could be done in some deterministic form.

Any good references on free monads that you'd recommend? Or instructive real-world usages? I tried to wrap my head around them a while back, and felt like I wasn't asking the proper questions to be able to understand the solutions.

jkr commented 8 years ago

@jgm: as a way of working through some Free monad discussions online, I reimplemented the Docx writer using free. It was fairly painless. I've yet to see if I can gain any speed using improve here, but there doesn't seem to be much performance penalty (looks like about a 0.1s increase on a 120,000-word manuscript, md->docx). Note that I have liftF sprinkled around instead of defining new functions, but that's easy enough to address later.

You can find it here: https://github.com/jkr/pandoc/tree/free

If this is still something you're interested in pursuing, I could try doing this for the other IO writers. We would probably also want a T.P.Free module, and move DocxActions and runDocxIO as a more general set of actions and a general runIO interpreter, respectively.

jgm commented 8 years ago

I like the idea! I assume DocxAction would be replaced by something like PandocAction that could be used in all the relevant writers? That would allow us to use uniform test harnesses, etc.

+++ Jesse Rosenthal [Sep 21 16 10:28 ]:

[1]@jgm: as a way of working through some Free monad discussions online, I reimplemented the Docx writer using free. It was fairly painless. I've yet to see if I can gain any speed using improve here, but there doesn't seem to be much performance penalty (looks like about a 0.1s increase on a 120,000-word manuscript, md->docx). Note that I have liftF sprinkled around instead of defining new functions, but that's easy enough to address later.

You can find it here: [2]https://github.com/jkr/pandoc/tree/free

If this is still something you're interested in pursuing, I could try doing this for the other IO writers. We would probably also want a T.P.Free module, and move DocxActions and runDocxIO as a more general set of actions and a general runIO interpreter, respectively.

— You are receiving this because you were mentioned. Reply to this email directly, [3]view it on GitHub, or [4]mute the thread.

References

  1. https://github.com/jgm
  2. https://github.com/jkr/pandoc/tree/free
  3. https://github.com/jgm/pandoc/issues/2930#issuecomment-248683717
  4. https://github.com/notifications/unsubscribe-auth/AAAL5HYYzoEf3rnaredflyreKytfxDpNks5qsWlTgaJpZM4Iiem3
jkr commented 8 years ago

Yep -- I moved it into a PandocAction Monad in Text.Pandoc.Free. Right now, I have pure versions of the EPUB, Docx, ODT, and ICML readers. There's still FB2 and RTF, but I think it's in fairly usable shape right now. I currently export a pure and IO version, but the only difference is that the runIO function from Text.Pandoc.Free is run to get the IO one (we could eventually just run this in the binary, if we wanted, and only export the pure version).

The repo is here: https://github.com/jkr/pandoc/tree/free and the comparison view is here: https://github.com/jgm/pandoc/compare/master...jkr:free

One oddness to take note of: in order to have generic IORef functions, I have to add a type parameter to the functor (PandocActionF) which produces the free monad, so the monad has a parameter (PandocAction a) To make this a bit less ugly in practice, I add a type to each of the writers (type EPUBAction = PandocAction [(FilePath, (FilePath, Maybe Entry))] or type DocxAction = PandocAction ()). This all produces a limitation, though -- if we want to use IORefs, they all have to have the same type, at least in a function. This isn't a problem now, but it's worth being aware of.

I'll look around to see if anyone deals with it in any useful way.

jgm commented 7 years ago

@jkr what's the status of these free monad experiments?

jkr commented 7 years ago

I got through making pure versions of the Docx, EPUB, ICML, and ODT writers, along with a runIO function to run them. Where I hit a snag was in trying to write a runTest function that would work with the IORefs in the ODT reader. There's no doubt a way to do it with unsafePerformIO or STRefs, but I haven't been able to follow it up.

The branch is here: https://github.com/jkr/pandoc/commits/free

If you remove the HEAD of that branch, you'll have working writers with a runIO. I have to rebase them given changes to those writers in the interim, of course, but that shouldn't be more than an hour's work or so.

jkr commented 7 years ago

Okay, I rebased the free branch on master. I also moved the test runner that I couldn't quite figure out to another branch to fiddle with. So if you pull from there you should have functioning pure writers.

jkr commented 7 years ago

I've dived back into this. In both cases (odt and epub) it looks like the only modification of IORefs is straightforward list cons-ing. So if our goal is to make pure writers, it seems like the best thing to do would just be to replace this with a plain State monad. We could also do ST, since we don't seem to use anything necessarily IO-ish, but I'll try State first and check performance.

jgm commented 7 years ago

Sounds good.

+++ Jesse Rosenthal [Nov 17 16 04:12 ]:

I've dived back into this. In both cases (odt and epub) it looks like the only modification of IORefs is straightforward list cons-ing. So if our goal is to make pure writers, it seems like the best thing to do would just be to replace this with a plain State monad. We could also do ST, since we don't seem to use anything necessarily IO-ish, but I'll try State first and check performance.

— You are receiving this because you were mentioned. Reply to this email directly, [1]view it on GitHub, or [2]mute the thread.

References

  1. https://github.com/jgm/pandoc/issues/2930#issuecomment-261231826
  2. https://github.com/notifications/unsubscribe-auth/AAAL5IJV5jEGa07WIMgzWtmRyD8tS1EJks5q_ESXgaJpZM4Iiem3
jkr commented 7 years ago

Any objection to me trying out glob instead of filemanip for font globbing for epubs? filemanip only has an io version.

jgm commented 7 years ago

Fine with me.

+++ Jesse Rosenthal [Nov 17 16 06:32 ]:

Any objection to me trying out [1]glob instead of [2]filemanip for font globbing for epubs? filemanip only has an io version.

— You are receiving this because you were mentioned. Reply to this email directly, [3]view it on GitHub, or [4]mute the thread.

References

  1. https://hackage.haskell.org/package/Glob
  2. https://hackage.haskell.org/package/filemanip
  3. https://github.com/jgm/pandoc/issues/2930#issuecomment-261261327
  4. https://github.com/notifications/unsubscribe-auth/AAAL5ObqsOA-5LJVdA6eu72XohNCfKseks5q_GWKgaJpZM4Iiem3
jkr commented 7 years ago

@jgm: okay -- we now have a functioning set of pure writers, along with a runIO function and a fairly functional pure runTest function (using State and Reader monads). The pure writers are a teeny bit slower, maybe 3%-5% based on my unscientific observations. I haven't looked at memory usage.

Right now, all the writers export a write{Format} doing IO and write{Format}Pure outputting PandocAction. The IO one just runs runIO on the pure one. We could also just output the pure one and run runIO in Text.Pandoc.

There are still some improvements to be made. I'd like to make PandocAction an instance of MonadError so we can throw and catch in the writers -- but I haven't quite figured out a good way to do it. I'd also like to streamline the functions we output from Text.Pandoc.Free -- for example we currently have a PandocAction version of three diffrerent file reading functions (Strict BS, Lazy, and UTF8). But this might be best because the IO interpreter can just use the different versions and not have to convert. In any case, it might be nice to prune it down a bit.

Anyway, it's in a workable form now. You can take a look here:

https://github.com/jkr/pandoc/tree/free-with-tests

jgm commented 7 years ago

+++ Jesse Rosenthal [Nov 18 16 14:16 ]:

[1]@jgm: okay -- we now have a functioning set of pure writers, along with a runIO function and a fairly functional pure runTest function (using State and Reader monads). The pure writers are a teeny bit slower, maybe 3%-5% based on my unscientific observations. I haven't looked at memory usage.

Great. That's not a big deal. If you want to look at memory, I've included a 'weigh-pandoc' executable which can be built by setting the 'weigh' flag. This would allow you to make comparisons (though I suppose the weigh-pandoc program itself would need to be revised for the API change).

Right now, all the readers export a write{Format} doing IO and write{Format}Pure outputting PandocAction. The IO one just runs runIO on the pure one. We could also just output the pure one and run runIO in Text.Pandoc.

A bit confused here...are we talking about readers or writers here, or both?

An application of this in the readers would be include files in LaTeX (already implemented with a complete hack) and RST.

There are still some improvements to be made. I'd like to make PandocAction an instance of MonadError so we can throw and catch in the writers -- but I haven't quite figured out a good way to do it. I'd also like to streamline the functions we output from Text.Pandoc.Free -- for example we currently have a PandocAction version of three diffrerent file reading functions (Strict BS, Lazy, and UTF8). But this might be best because the IO interpreter can just use the different versions and not have to convert. In any case, it might be nice to prune it down a bit.

Agreed.

jgm commented 7 years ago

Maybe a good thing to do would be to get a list together of which IO operations occur in which writers, and why. Looking at the list in Text.Pandoc.Free, I can't even remember why some of those things are there.

jgm commented 7 years ago

Tempting to change the API across the board, so that ALL readers and writers uniformly go in the PandocActionF monad. So e.g.

readMarkdown :: ReaderOptions -> String -> PandocF Pandoc

writeHtmlString :: WriterOptions -> Pandoc -> PandocF String

main = runIO $ readMarkdown def "hello world" >>= writeHtmlString def

That would simplify the types a lot; we'd no longer need a distinction between IO writers and pure writers, for example. (If we really wanted to simplify things, I suppose we could have them all produce ByteString output; I don't know if that's a good idea, though.)

We could then implement things like \today or \include in LaTeX, and their equivalents in other formats, in a clean way, and get rid of the ugly handleIncludes hack.

jgm commented 7 years ago

Rethinking the API a bit more radically:

main = runIO $ do
   setReaderOption readerSmart True
   setWriterOption writerColumns 50
   readFileUTF8 "myfile.txt" >=> readMarkdown >=> allCapsFilter >=> writeHtmlString
                                            >=> writeFileUTF8 "myfile.html"
jkr commented 7 years ago

A bit confused here...are we talking about readers or writers here, or both?

Sorry -- I meant to say "writers" there. But as you've pointed out, it would be interesting to extend it to writers.

One problem that occurs to me -- we already have a number of readers and writers that are pure (so long as you're not doing a standalone). Say, docx -> markdown. So would necessitating runIO on everything remove that ability? Granted, mkStringReader and mkBSReader already make everything IO, but they don't need to, do they?

jkr commented 7 years ago

Sorry -- I meant to say "writers" there. But as you've pointed out, it would be interesting to extend it to writers.

And, yes, I mean to say "extend it to readers." This might be a hardwired expressive difficulty for me.

jgm commented 7 years ago

And with a runPure function, we could provide functions to set some of these things that would normally be gotten via IO (current time, date, contents of files).

main = runPure $ do
   setCurrentTime = UTCTime{...}
   writeFileUTF8 "myinclude.tex" -- this just puts a "file" in state
   readLaTeX "\\today\n\\include{myinclude.tex}" >=> writeHtmlString
jkr commented 7 years ago

For pruning, I got rid of newUUID by introducing a pure RandomGen g => g -> UUID function in Text.Pandoc.UUID (since we already have a newStdGen function in Free).

next step: getPosixTime and getCurrentTime are obviosly redundant, since there are utcTimeToPOSIXSeconds and the inverse. Any preference on what should be our primitive? (In Data.Time.Clock, getCurrentTime is defined internally as posixSecondsToUTCTime <$> getPOSIXTime).

jkr commented 7 years ago

I really like the idea of the separate runPure and runIO handlers.

jkr commented 7 years ago

I'm going to see if I can move getDefaultReference{Docx,Odt} into the relevant writers as well, so they can use the T.P.Free functions instead of IO.

jkr commented 7 years ago

Ugh -- there's a bit of a labyrinth here. readDefaultDataFile calls getDefaultReference{Docx,Odt} so they have to be in Shared. I can't quite follow why it needs to do that, though, since neither the Docx writer nor the ODT writer every call readDataFile or readDefaultDataFile. And we're not changing the bytestrings those writers output after the fact.

@jgm: I think you know the logic there better -- could you take a look at what would be required to move the getDefaultReference* funcs to the relevant writers.

jkr commented 7 years ago

Also, should we move this to a branch on your fork so we can both push to it?

jgm commented 7 years ago

+++ Jesse Rosenthal [Nov 19 16 02:52 ]:

next step: getPosixTime and getCurrentTime are obviosly redundant, since there are utcTimeToPOSIXSeconds and the inverse. Any preference on what should be our primitive? (In Data.Time.Clock, getCurrentTime is

I don't particularly care, but getCurrentTime is more user-friendly, I suppose.

jgm commented 7 years ago

+++ Jesse Rosenthal [Nov 19 16 03:23 ]:

Ugh -- there's a bit of a labyrinth here. readDefaultDataFile calls getDefaultReference{Docx,Odt} so they have to be in Shared. I can't

Off the top of my head, I think the only reason we need this is to support the --print-default-data-file option, which can take the reference.odt/docx as an argument.

In principle that could be special-cased at the level of pandoc.hs.

Note that another option would be to put these in WriterOptions or something, and load them up in pandoc.hs and pass them to the writers. Then the writers wouldn't ever need to do an IO operation to get these.

jgm commented 7 years ago

I can put up a branch on jgm/pandoc -- which branch of yours is the current one, free-with-tests?

jgm commented 7 years ago

NB. If we make these large-scale API changes, I think we should call it pandoc 2.0.

jkr commented 7 years ago

Yep -- I just rebased it on master this morning. On Nov 19, 2016 7:25 AM, John MacFarlane notifications@github.com wrote:I can put up a branch on jgm/pandoc -- which branch of yours is the current one, free-with-tests?

—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread.

jkr commented 7 years ago

Note that another option would be to put these in WriterOptions or something, and load them up in pandoc.hs and pass them to the writers. Then the writers wouldn't ever need to do an IO operation to get these

Would it be preferable then to have just one writerrOptReferenceDoc field? The pro side would be a bit more elegance. The con side would be that if it's already an archive we have to sanity check it somewhere (look for identifying entries). This beyond just checking extensions in pandoc.hs in case someone wants to use it programmatically.

Good side of this is that it doesn't really matter if it's a legal file so long as it has the style files we want. Si checking for these is necessary anyway.

Q: better if the option type is Maybe Archive or Maybe ByteString?

jkr commented 7 years ago

And along with that just one --reference-doc command line option. Like --template. We can only have one output format at a time, after all.

jkr commented 7 years ago

Something else to think about: I'm wondering whether the Free implementation is overkill for what we're doing here. The benefit of a Free monad, over just having a RealWordState variable that we can populate with IO at the beginning, is that the Free monad can model the world changing -- say, with numerous calls to readLine. But my sense is that in all of our computations, the world doesn't change after we begin. So if we set a state at the outset, it's going to stay that way. I haven't thought this through with regard to the readers, but why couldn't all of the architectural changes you're proposing work with something like the below? (This isn't a rhetorical question -- I'm really not quite sure I've thought this through fully.)


writeFoo :: WriterOptions -> Pandoc -> State RealWorldState ByteString

runPure :: RealWordState -> State RealWordState ByteString -> ByteString
runPure st bsInState = evalState bsInState st

runIO :: State RealWorldState ByteString -> IO ByteString
runIO bsInState = getStateFromRealWord >>= evalState bsInState
jgm commented 7 years ago

+++ Jesse Rosenthal [Nov 19 16 06:43 ]:

Something else to think about: I'm wondering whether the Free implementation is overkill for what we're doing here. The benefit of a Free monad, over just having a RealWordState variable that we can populate with IO at the beginning, is that the Free monad can model the world changing -- say, with numerous calls to readLine. But my sense is that in all of our computations, the world doesn't change after we begin. So if we set a state at the outset, it's going to stay that way. I haven't thought this through with regard to the readers, but why couldn't all of the architectural changes you're proposing work with something like the below? (This isn't a rhetorical question -- I'm really not quite sure I've thought this through fully.)

writeFoo :: WriterOptions -> Pandoc -> State RealWorldState ByteString

runPure :: RealWordState -> State RealWordState ByteString -> ByteString runPure st bsInState = evalState bsInState st

runIO :: State RealWorldState ByteString -> IO ByteString runIO bsInState = getStateFromRealWord >>= evalState bsInState

I think this could work for most of our applications, but an exception is include files and the like. We don't know which files we need to read until we've done some parsing.

jgm commented 7 years ago

I've set up a 'free' branch on jgm/pandoc. It's derived from your free-with-tests branch.

jgm commented 7 years ago

+++ Jesse Rosenthal [Nov 19 16 05:38 ]:

Note that another option would be to put these in
WriterOptions or something, and load them up in pandoc.hs
and pass them to the writers. Then the writers wouldn't
ever need to do an IO operation to get these

Would it be preferable then to have just one writerrOptReferenceDoc field? The pro side would be a bit more elegance. The con side would be that if it's already an archive we have to sanity check it somewhere (look for identifying entries). This beyond just checking extensions in pandoc.hs in case someone wants to use it programmatically.

I'm not sure what you have in mind by a sanity check. Why not say it's up to the user to provide a sensible value (just as it is for templates)?

Good side of this is that it doesn't really matter if it's a legal file so long as it has the style files we want. Si checking for these is necessary anyway.

Q: better if the option type is Maybe Archive or Maybe ByteString?

Yes. And while we're at it, if we do this, we should take out writerStandalone and replace writerTemplate with a Maybe value.

jgm commented 7 years ago

Do we have reason to prefer the free monad approach to a typeclass approach? That is, define a PandocMonad typeclass with functions to read a file, get the current time, and so on. Then make readers and writers parametric so they work in any instance of PandocMonad. Finally, define instances for IO and for something like State FakeRealWorld.

Potential advantages:

Potential drawbacks:

jkr commented 7 years ago

John MacFarlane notifications@github.com writes:

Do we have reason to prefer the free monad approach to a typeclass approach? That is, define a PandocMonad typeclass with functions to read a file, get the current time, and so on. Then make readers and writers parametric so they work in any instance of PandocMonad. Finally, define instances for IO and for something like State FakeRealWorld.

The main problem that I see with this is that Free monad allows us to test the IO writers in a deterministic way. If I understand this correctly (not a given) this would make the IO Writers an IO instantiation of the typeclass. So testing would still be an issue.

jgm commented 7 years ago

The main problem that I see with this is that Free monad allows us to test the IO writers in a deterministic way. If I understand this correctly (not a given) this would make the IO Writers an IO instantiation of the typeclass. So testing would still be an issue.

If there's a non-IO instance, couldn't we use that for deterministic tests? I don't see why this would be different from the free monad approach.

jkr commented 7 years ago

Again, I might not understand the approach. But let's take the docx writer as an example. With the free monad, we have one writer, and the only thing that changes is how we interpret P.readFile et al. In the typeclass, we would have something like this, right?

internalFunctionInWriter :: OurTypeClass a => a String
internalFunctionInWriter = ourReadFile

But where would ourReadFile be defined -- and how would we switch between readFile :: IO String and fakeReadFile :: State FakeRealWorld String? I guess I'm not sure I see how to define it so that it knows to use one or the other based on output type. But I feel like I'm probably missing something.

jgm commented 7 years ago

The idea would be that ourReadFile is defined in the PandocMonad typeclass:

class PandocMonad a where
  ourReadFile :: FilePath -> a String

and then there's be an instance for IO:

instance PandocMonad IO where
  ourReadFile :: FilePath -> IO String
  ourReadFile = readFile

and an instance for State FakeRealWorld String:

instance PandocMonad (State FakeRealWorld) where
  ourReadFile :: FilePath -> State FakeRealWorld String
  ourReadFile fp = gets fakeFiles >>= (maybe (throwException FileNotFound) return . lookup fp)

something like that...

jkr commented 7 years ago

Okay -- I see now. So, where right now in the Free module we have the different definitions through pattern-matching (runIO, runTest, runPure, what have you), in this version we'd have the different definition in the typeclass (just typing out for my own benefit, implementations are off the top of my head):


instance PandocMonad IO where
   readFileLazy = BL.readFile
   getStdGen    = Random.getStdGen
   ...

instance PandocMonad (State FakeRealWorld) where
  readFileLazy fp = BL.fromStrict $ fromJust (lookup fp <$> gets stOurFakeFileTree)
  getStdGen = do g <- gets stStdGen
                 let (_, nxt) = next g
                 modify $ \st -> st {stStdGen = nxt}
                 return g
  ...

This would be simpler to write, for sure.

The only issue is the one we discussed earlier. This will limit us to a state, at any time, that is purely determined by our input and beginning state. (This would mean that we couldn't do something like poll an external resource that we expect to change as in an db management system or a terminal interface that gets user input.) You had suggested this might be a problem with LaTeX includes: we can't get the filepath until after we parse. But I think it's okay to assume that the universe is determined at the moment of invocation: even though we won't know the filename until we parse, we can stipulate that it would have needed to be there a split-second or two earlier (when we called pandoc) to be legal.

jkr commented 7 years ago

No, I take it back -- I get it now. We'd only be limited in the FakeRealWorld version. In the IO version, we'd be proceeding as usual. That's the difference between populating FakeRealWorld at the beginning even for IO, which we had discussed earlier, and what you're discussing above.

jkr commented 7 years ago

I think that almost all the implementations would be identical to the ones in Free, only without the continuation pipe (i.e., without the >>= runIO . f). So it should be pretty quick to try to transfer over the runIO and runTest versions over to play around with it.

I'll try to play around with it.

jgm commented 7 years ago

Yes, I think either approach could work. I'm not sure whether there's a strong reason to go with one rather than the other. There's some relevant discussion here:

One point people make is that with the Free Monad, you have access to the AST and can manipulate it, e.g. combining operations to optimize them. I don't see any real need for that feature in our case.

jkr commented 7 years ago

Yeah -- it looks pretty straightforward (famous last words) to transfer the work from free over to typeclass -- I should have it done before my coffee is cold. I'll put a branch up with that an we can compare. My general bias would be to err on the side of conceptual simplicity (typeclasses) all else being equal. But let's see if all else is indeed equal.

jkr commented 7 years ago

@jgm -- here's the typeclass version

https://github.com/jkr/pandoc/tree/typeclass

I think Text.Pandoc.Class is much easier to read and follow than Text.Pandoc.Free was. The type signatures in the writers get a bit annoying with the need to state the constraint each time, but that's doable. Same goal of pruning down primitive functions (in this case, functions in the typeclass definition) still seems to be in effect.

We'd be able to maintain some form of the monadic setters in the pure versions. Since the pure version puts out a State (or a ReaderT State, or an RWS, or whatever), we'd be in that before we ran eval/runState. So we could still do something like:

writePureDocx :: FakeRealWorld -> WriterOptions -> Pandoc -> ByteString
writePureDocx st opts doc = flip evalState st $ do
  set stReferenceDocx someArchive
  writeDocx opts doc

where set is some convenience wrapper over modify. Not quite as nice, but with some wrappers it probably could be.

jgm commented 7 years ago

[EDIT: Sorry, I evidently misunderstood your post in what I intially wrote in this reply. But see my next response. I guess it's a question whether we want writeDocxPure or rather something additional that applies to writeDocx.]

jgm commented 7 years ago

I guess one advantage fo the Free Monad approach is that one could use something like State FakeRealWorld in the interpreter, behind the scenes as it were, without the user needing to know anything about it. Running runPure could just spit out something like Either [String] ByteString where [String] is a list of warning messages.

But we could always add an auxiliary function like

pureWithWarnings :: State FakeRealWorld a -> Either [String] a

Not quite as clean though.