potamides / AutomaTikZ

Text-Guided Synthesis of Scientific Vector Graphics with TikZ
Apache License 2.0
71 stars 3 forks source link

Very Low Quality Output #15

Open shivamCode0 opened 1 month ago

shivamCode0 commented 1 month ago

I am encountering issues when using the provided example code from the README with minor modifications. The results generated are of much worse quality than expected. Additionally, the model frequently runs into OutOfMemory (OOM) errors, despite running on powerful hardware.

Code to Reproduce

The code below was used to generate the outputs.

import torch
from automatikz.infer import TikzGenerator, load

# Select 2nd GPU
device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")
generate = TikzGenerator(*load("nllg/tikz-clima-7b"), stream=True)

caption = "Physics diagram with a mass m on an inclined plane."
generate(caption)  # Streams generated tokens to stdout

Output

Output 1 ```latex Physics diagram with a mass m on an inclined plane. ... (5) \documentclass{article} \usepackage[utf8]{inputenc} \usepackage{tikz} \usepackage{pgfplots} \usetikzlibrary{arrows, decorations.markings, decorations.pathmorphing, backgrounds, positioning, fit, petri} \usepackage{amssymb} \usepackage{amsmath} \begin{document} \begin{tikzpicture}[scale=.85] \draw (0,0) -- (1.5,2) -- (3,2) -- (3.5,0) -- (3,-.7) -- (.5,-.7) -- (.5,0) -- (0,0); \draw[fill] (0,0) circle (1.6pt);\draw[fill] (0.5,0) circle (1.6pt);\draw[fill] (1,0) circle (1.6pt); \draw[fill] (1.5,0) circle (1.6pt); \draw[fill] (2,0) circle (1.6pt); \draw[fill] (2.5,0) circle (1.6pt); \draw[fill] (3,0) circle (1.6pt); \draw[fill] (3.5,0) circle (1.6pt); \draw[fill] (4,0) circle (1.6pt); \draw[fill] (.5,-.7) circle (1.6pt); \draw[fill] (1,-.7) circle (1.6pt); \draw[fill] (1.5,-.7) circle (1.6pt); \draw[fill] (2,-.7) circle (1.6pt); \draw[fill] (2.5,-.7) circle (1.6pt); \draw[fill] (3,-.7) circle (1.6pt); \draw[fill] (3.5,-.7) circle (1.6pt); \draw[fill] (4.5,-.7) circle (1.6pt); \draw[fill] (5,-.7) circle (1.6pt); \draw[->] (0,0) -- (0,.8); \draw[->] (0.5,0) -- (0.5,.8); \draw[->] (1,0) -- (1,.8); \draw[->] (1.5,0) -- (1.5,.8); \draw[->] (2,0) -- (2,.8); \draw[->] (2.5,0) -- (2.5,.8); \draw[->] (3,0) -- (3,.8); \draw[->] (3.5,0) -- (3.5,.8); \draw[->] (4,0) -- (4,.8); \draw[->] (.5,-.7) -- (.5,-.1); \draw[->] (1,-.7) -- (1,-.1); \draw[->] (1.5,-.7) -- (1.5,-.1); \draw[->] (2,-.7) -- (2,-.1); \draw[->] (2.5,-.7) -- (2.5,-.1); \draw[->] (3,-.7) -- (3,-.1); \draw[->] (3.5,-.7) -- (3.5,-.1); \draw[->] (4.5,-.7) -- (4.5,-.1); \draw[->] (5,-.7) -- (5,-.1); \end{tikzpicture} \end{document} ```
Output 2 (Invalid) ``` \begin{ Physics diagram with a mass m on an i333nclined plane. :comma- separated list of PBM file names. Example: pbm ... -densify }; data = { double double\_value; }; /pbm_density/.cd, thickness/.store in = \density_thickness, thickness = 0.5, double\_value/.store in = \density_value, double\_value = 0. ] \documentclass[border=10pt,tikz]{standalone} \usepackage{pgfplots} \pgfplotsset{compat=newest} \usepackage{mathtools} \usepackage{xcolor} \let\oldvec\vec \DeclareMathOperator{\oldvec}{vec} \usepackage{tikz} \usetikzlibrary{patterns} \begin{document} \begin{tikzpicture} \begin{axis}[ xlabel = {Number of points}, ylabel = {Time [in seconds]}, xmin = 0, xmax = 1200, ymin = 0, ymax = 2200, legend entries = {Classic PBM, PBM with densification}, legend pos = south east ] \addplot[ scatter, only marks, mark = *, scatter src = explicit, nodes near coords* = \noexpand\pgfplotspointmeta, visualization depends on={value \thisrow{label} \as \noexpand\pgfplotspointmeta}, every node near coord/.append style = {font = \noexpand\pgfkeysvalueof{/pgfplots/scaled enlarge x limits}, anchor = center, rotate = 0}, ] table [ meta = label, x = number, y = double\_value, col sep = comma ] { number thickness double\_value label\\ 100 0.15 PBM-file1.density\\ 250 0.22 PBM-file2.density\\ 400 0.29 PBM-file3.density\\ 550 0.37 PBM-file4.density\\ 700 0.45 PBM-file5.density\\ 850 0.53 PBM-file6.density\\ 1000 0.61 PBM-file7.density\\ 1150 0.69 PBM-file8.density\\ 1300 0.77 PBM-file9.density\\ 1450 0.85 PBM-file10.density\\ 1600 0.93 PBM-file11.density\\ 1750 1.01 PBM-file12.density\\ 1900 1.09 PBM-file13.density\\ 2050 1.17 PBM-file14.density\\ 2200 1.25 PBM-file15.density\\ }; \addplot[ domain=0:1200, samples=100, thick, smooth ] {1750*x+150}; \end{axis} \end{tikzpicture} \end{document} ```

This output does not compile and contains garbled text, including irrelevant code snippets (e.g., PBM file parsing).

System Information

Additional Notes

I would appreciate any insights or potential fixes for the inconsistent output and memory issues.

potamides commented 1 month ago

Thanks for your interest in this project! You declare a device variable in your code snippet but then don't make use of it when loading the model. Since you seem to have two GPUs you could instead try to add device_map="auto" to the parameter list of load(). This should make better use of your hardware and hopefully fix your OOM error.

You could also try to do inference using the 13b model which performs better than the 7b one, but even for the 7b model, the garbled output you get is unexpected. You might achieve better results by rewriting your prompt (cf. our paper and here for the types of prompts our models have seen during training; generally they are much longer than the one you provide), and doing inference with our iterative resampling algorithm described in our paper (and defined here) can help you recover from compile errors.

We also have a follow-up work here which might be interesting to you.

shivamCode0 commented 4 weeks ago

I’ve tried running the model with the original prompt from the README, but unfortunately, I still encounter the same issues. Regarding GPU usage, I can confirm that the model was utilizing 1 of the 2 available GPUs in my setup. The code I shared in the issue was simplified from my original implementation, which already included GPU management.

I’ll experiment with the device_map="auto" option and also explore using the tikz-clima-13b model, as well as applying iterative resampling as you recommended. I’ll also take a closer look at the prompts used during training to refine my input.

I've seen your follow-up work, but I'll look more into it after further testing these approaches.

Thanks for your assistance!