vatlab / sos

SoS workflow system for daily data analysis
http://vatlab.github.io/sos-docs
BSD 3-Clause "New" or "Revised" License
272 stars 45 forks source link

Preview of dot files containing multiple graphs as gif animation #919

Closed BoPeng closed 6 years ago

BoPeng commented 6 years ago

Currently our %preview magic can preview .dot files containing one figure (source code, example).

With the implementation of #888, .dot file generated by sos would contain multiple graphs showing the evolution of DAG during the execution of workflows. It would be great if we could preview them as animations (.gif), so I believe that we can generate multiple .png files using the existing code, and generate .gif from these png files.

I look forward to demonstrating SoS using

%preview dag.dot
%run -d dag.dot
WORKFLOW

Note that displaying gif in Jupyter is possible, but might need some tricks. See Calysto/octave_kernel#105 and ipython/ipython#10045.

@HenryLeongStat , let me know if you are interested in this ticket.

HenryLeongStat commented 6 years ago

Interested! Always want to learn how to implement a magic 😄

BoPeng commented 6 years ago

Great, you can

  1. first have a look at #888
  2. learn how to use the -d option to generate .dot file containing multiple graphs
  3. learn how to generate .png files and convert .png files to .gif using dot and then imagemagick/convert.
  4. learn how to convert dot to multiple .png files programmatically, and convert .png to .gif using imagemagick or something else.
  5. Figure out how to properly preview such .gif file inside Jupyter.

Let me know if you need help on any of these.

HenryLeongStat commented 6 years ago

Is it basically like the followings? :

  1. Split up a .dot file which contains more than 1 strict digraph{...} into multiple .dot files.
  2. %preview -n or graphviz.Source() each of them in order to output them as .png files. (Is there a way to export the graph as .png in %preview ? -s?)
  3. Combine all the .png files as a .gif files.
  4. View it. (Tried to use markdown to do it, example: ''' gundam ''')
BoPeng commented 6 years ago

Yes, If you cannot find any option for Source.render to produce multiple png files, you will have to split the file manually and render them one by one. You will have to figure out how Jupyter displays gif though. I have not tried %preview gopher.gif by myself and not sure if %preview can handle `gif files.

HenryLeongStat commented 6 years ago

I have not tried %preview gopher.gif by myself and not sure if %preview can handle `gif files.

Actually, it can!

screen shot 2018-04-03 at 10 27 43 am
BoPeng commented 6 years ago

Good. Splitting .dot file should be easy. The only remaining problem is how to convert png files to gif.

HenryLeongStat commented 6 years ago

Should be very easy! Reference: https://stackoverflow.com/questions/753190/programmatically-generate-video-or-animated-gif-in-python

HenryLeongStat commented 6 years ago

Also, I tried to figure out whether 1Source.render()1 can handle multiple chucks of graph code ({...}) in a .dot file. Seems it is able to do that itself (in CLI), but couldn't find whether it can be done inside Python. I decide to split up the .dot files manually, which should be easy I think?

HenryLeongStat commented 6 years ago

Done for the splitting:

file = open('a.dot', 'r')
text = file.read()
print(text)
splitText = text.split("strict digraph ")
for i in range(1,len(splitText)-1):
    splitName = str(i) + "splitText" + ".dot"
    file = open(splitName, "w")
    writeString = "strict digraph " + splitText[i]
    file.write(writeString)
BoPeng commented 6 years ago

Source can handle strings, no need for temporary files, right?

HenryLeongStat commented 6 years ago

Yes, but exporting as .png should be in temporary directory. Working on it.

HenryLeongStat commented 6 years ago

Opened a new branch and almost finished. Professor, how fast you want the .gif file changing every scene?

HenryLeongStat commented 6 years ago

By default: movie

BoPeng commented 6 years ago

Looks great. One second for each step?

HenryLeongStat commented 6 years ago

One second for each step: movie

HenryLeongStat commented 6 years ago

In the original code, preview_dot() returns as the followings:

with open(outfile, 'rb') as content:
       data = content.read()
return {'image/png': base64.b64encode(data).decode('ascii') }

What should it return for the .gif case?

Fix the followings:

        for filename in pngFiles:
            images.append(imageio.imread(filename))
        data = imageio.mimsave('movie.gif', images)
        os.remove(pngFiles)
        return {'image/gif': movie.gif}
BoPeng commented 6 years ago

You will need to check current code for previewing images this line of code and try. You also need to return regular png when there is a single png file.

HenryLeongStat commented 6 years ago

I see... That's what you are talking about viewing .gif in jupyter...

BoPeng commented 6 years ago

You will need to return the png for the last picture as default for systems that cannot render gif. This is why in img handler we return both gif and png.

HenryLeongStat commented 6 years ago

Super weird... Using the followings code, the static image shows.

def preview_dot(filename, kernel=None, style=None):
    from graphviz import Source
    with open(filename) as dot:
        fileNameElement = "sosDotFilesPng"
        src = Source(dot.read(), filename = fileNameElement)
    src.format = 'png'
    outfile = src.render()
    from os import listdir
    from os.path import isfile, join
    import re
    pngFiles = [f for f in listdir("./") if isfile(join("./", f)) and fileNameElement in f and ".png" in f and bool(re.search(r'\d', f))]
    if len(pngFiles == 100):
        with open(outfile, 'rb') as content:
            data = content.read()
        return {'image/png': base64.b64encode(data).decode('ascii') }
    else:
        pngFiles.sort(key=lambda x: int(x.split('.')[1]))
        pngFiles.insert(0, fileNameElement + '.png')
        images = []
        for filename in pngFiles:
            images.append(imageio.imread(filename))
        data = imageio.mimsave('movie.gif', images, duration = 1)
        from wand.image import Image
        img = Image(filename='movie.gif')
        with open('movie.gif', 'rb') as f:
            image = f.read()
        import imghdr
        image_type = imghdr.what(None, image)
        image_data = base64.b64encode(image).decode('ascii')
        os.remove(pngFiles)
        return { 'image/' + image_type: image_data,
                'image/png': base64.b64encode(img._repr_png_()).decode('ascii') }

So I remove those two return like the followings:

def preview_dot(filename, kernel=None, style=None):
    from graphviz import Source
    with open(filename) as dot:
        fileNameElement = "sosDotFilesPng"
        src = Source(dot.read(), filename = fileNameElement)
    src.format = 'png'
    outfile = src.render()
    from os import listdir
    from os.path import isfile, join
    import re
    pngFiles = [f for f in listdir("./") if isfile(join("./", f)) and fileNameElement in f and ".png" in f and bool(re.search(r'\d', f))]
    if len(pngFiles == 100):
        with open(outfile, 'rb') as content:
            data = content.read()
    else:
        pngFiles.sort(key=lambda x: int(x.split('.')[1]))
        pngFiles.insert(0, fileNameElement + '.png')
        images = []
        for filename in pngFiles:
            images.append(imageio.imread(filename))
        data = imageio.mimsave('movie.gif', images, duration = 1)
        from wand.image import Image
        img = Image(filename='movie.gif')
        with open('movie.gif', 'rb') as f:
            image = f.read()
        import imghdr
        image_type = imghdr.what(None, image)
        image_data = base64.b64encode(image).decode('ascii')
        os.remove(pngFiles)

It supposed to be not showing up anything, but the image still shows up. How?!?

HenryLeongStat commented 6 years ago

Let me use the Linux machine to do the test...

HenryLeongStat commented 6 years ago

By the way, professor, it is hard to debug in jupyter notebook... (i.e.: don't know which lines of code were run...)

HenryLeongStat commented 6 years ago

Found that there are multiple version of SoS installed in my machine:

./anaconda/lib/python3.6/site-packages/sos_notebook-0.9.10.12-py3.6.egg/sos_notebook/preview.py:def preview_md(filename, kernel=None, style=None):
./anaconda/lib/python3.6/site-packages/sos-0.9.13.3-py3.6.egg/sos/preview.py:def preview_md(filename, kernel=None, style=None):
./anaconda/lib/python3.6/site-packages/sos-0.9.8.3-py3.6.egg/sos/jupyter/preview.py:def preview_md(filename, kernel=None, style=None):

I deleted those old version. Before that, the version run in jupyter is the old version. Not sure why.

BoPeng commented 6 years ago

Yes, remove all old versions and try again.

HenryLeongStat commented 6 years ago

Done! Professor, do you think I should save all the .png and .gif in tempfile.gettempdir()? It works fine here because all those files will be deleted.

HenryLeongStat commented 6 years ago

Btw, finally figured out how to debug in jupyter, and it is fun! :D

BoPeng commented 6 years ago

Yes, please use a temporary directory because whatever you use, there is a chance that a user has a valid file in that name...

HenryLeongStat commented 6 years ago

OK! Let me fix it.

HenryLeongStat commented 6 years ago

Should I create a folder to do it? For now I will get all the files name and then delete them.

BoPeng commented 6 years ago

You create a temporary folder. See module temfile for details.

HenryLeongStat commented 6 years ago

I see! So actually the one created by using tempfile.TemporaryDirectory() is different from tempfile.gettempdir(). I remember we used tempfile.gettempdir() for kernels saving .feather files or others.

HenryLeongStat commented 6 years ago

Done. It's weird that tempfile.TemporaryDirectory() return as a class while tempfile.gettempdir() returns as a string. So I need to use with for getting the string of the TemporaryDirectory.

BoPeng commented 6 years ago

Your patch did not work for me. According to https://github.com/ipython/ipython/issues/10045 , the gif has to be returned as image/png for jupyter to recognize. Can you try the updated patch?

I have also modified your code to use glob.glob instead of os.listdir, and some other places to make the code shorter. You should learn to use pyflakes to find those places though.

HenryLeongStat commented 6 years ago

Your patch did not work for me. According to ipython/ipython#10045 , the gif has to be returned as image/png for jupyter to recognize. Can you try the updated patch?

You mean I put the .gif in side image/png as well? And not create any other directory except for image/png? Let me update it after the meeing.

I have also modified your code to use glob.glob instead of os.listdir, and some other places to make the code shorter. You should learn to use pyflakes to find those places though.

OK! Let me find a tutorial of it! 😄

HenryLeongStat commented 6 years ago

It works! duration = 0.5 looks better than 1!

BoPeng commented 6 years ago

I have updated the code, just let me know if it works on your end (works for me now).

The problem was that {'image/gif': gifdata} does not produce any picture here and I had to use {'image/png': gifdata} as suggested by that link.

pyflakes is useful to find places like here when data is assigned but never used.

BoPeng commented 6 years ago

I thought of 1 second per step but each step actually has three substeps (start, execute, done etc).

HenryLeongStat commented 6 years ago

In this case, should we edit L75-L80? It would produce another directory if the image file isn't .png fommat.