jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.88k stars 3.39k forks source link

add no-attachments extension to ipynb format. #8432

Open lsmor opened 2 years ago

lsmor commented 2 years ago

Describe your proposed improvement and the problem it solves.

By default, pandoc will create attachments for image links when writing ipynb as stated by the documentation. The way pandoc creates those, is via addAttachment function in the Ipynb writer. Such a function depends on fetchItem which as far as I can tell is meant to read the raw ByteString from file system.

This has two problems:

-- The path to some filesystem image
pathToImage :: String
pathToImage = "./path/to/some_image.png"

-- a raw string representing input data.  This is the native output of a markdown with the following content
-- ![this is an image](./path/to/some_image.png)
pan :: Pandoc
pan = read $ "Pandoc Meta{unMeta = fromList []} [Para[Image (\"\",[],[]) [Str \"this\",Space,Str \"is\",Space,Str\"an\",Space,Str\"image\"] (\"" <> pathToImage <> "\" ,\"fig:\")]]"

main :: IO ()
main = do

    -- Fails
    let jupter_pure = Pandoc.runPure $ Pandoc.writeIpynb Pandoc.def pan
    print $ jupter_pure

    putStrLn "****************"

    -- Success
    jupter_txt <- Pandoc.runIO $ Pandoc.writeIpynb Pandoc.def pan
    print $ jupter_txt

I think this change could by adressed by changing function extractCells so instead of calling addAttachment it checks first if the extension is enable. If it is, then do not modify the Inline, else modify it. This will create a regular markdown imagen link instead of an attachment

-- on extractCells function
-- this line
(newdoc, attachments) <-
      runStateT (walkM addAttachment (Pandoc nullMeta xs)) mempty

-- should become something like
(newdoc, attachments) <-
     if new_extension_enable              -- Because this function get WriterOptions as input, it shouldn't be difficutl to check
        then  pure (xs, Map.fromList [])  -- Return the Block unaltered and don't add a thing to the MediaBag
        else  runStateT (walkM addAttachment (Pandoc nullMeta xs)) mempty

I think I can implement this change if you confirm this is the way yo go.

Describe alternatives you've considered.

I read the documentation looking for other options to achive this or manipulating the raw Text produced by writeIpynb (I am working with pandoc the library)

jgm commented 2 years ago

first: if there is an image reference but the image is not in the filesytem then the writer fails completely, instead of producing a broken link to the image

This could be addressed by handling the error raised by fetchItem. Would that be a simpler approach than adding a new extension?

lsmor commented 2 years ago

This could be addressed by handling the error raised by fetchItem. Would that be a simpler approach than adding a new extension?

Well, the error is simply Left (PandocResourceNotFound "path/to/image.png"). You can handle that... but the goal is to produce and ipynb in which images aren't attachments but regular markdown image links, I don't think handle it helps. Moreover, you still have the problem of not being able to use runPure if using the pandoc library.

I am thinking about a nasty filter which converts images to links with a ! in front. Let me check if that works.

lsmor commented 2 years ago

I turned out, that a simple filter can be used. Shame on me! I am closing this as no changes needed.

imageToLink :: Pandoc.Block -> Pandoc.Block
imageToLink (Pandoc.Para (Pandoc.Image attrs inl target:is)) = Pandoc.Para $ Pandoc.Str "!":Pandoc.Link attrs inl target:is
imageToLink i = i
jgm commented 2 years ago

You can handle that... but the goal is to produce and ipynb in which images aren't attachments but regular markdown image links, I don't think handle it helps.

Why not? We could handle the error by just including a regular image link with that path, and issuing a warning.

you still have the problem of not being able to use runPure if using the pandoc library.

fetchItem can be used with runPure. It won't do any actual IO, but it will still look in the ersatz file system, and it will raise an error if nothing else -- which can be trapped.

Let's keep this open.

lsmor commented 2 years ago

Maybe I am a little bit lost, are you proposing to actually change the code so writeIpynb handles such an error? I am happy to contribute to that (Notice that my actual problem all images must be links can be solved with a simple filter)

Comming back to writeIpynb, I guess addAttachment can be modified to handle that error and not modify the Image part. I am not so sure how to actually handle the error though. Is there any example within the code base I can look at? (I am not that familiarize with pandoc but I know enough Haskell to follow along the types)