Closed aspnetcs closed 2 years ago
Please elaborate. From what data? In principle you need a text file with normalized latex equations and corresponding images that are named in such a way, that each image file name matches a line in the text file. Example: 0000.png - first line 0001.png - second line ...
If you need to scrape more data you can look into the methods I wrote in dataset/scraping.py
.
How to convert the data in this URL into your pkl format?
https://github.com/LinXueyuanStdio/Data-for-LaTeX_OCR/tree/d8dd211270746a86caf85cbe5aab93f2a4bee0df
--
At 2021-12-23 19:22:41, "Lukas Blecher" @.***> wrote:
Please elaborate. From what data? In principle you need a text file with normalized latex equations and corresponding images that are named in such a way, that each image file name matches a line in the text file. Example: 0000.png - first line 0001.png - second line ...
If you need to scrape more data you can look into the methods I wrote in dataset/scraping.py.
— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>
As far as I can see it is straight forward. For the small dataset the training pkl file would be
python dataset/dataset.py --equations data/small/formulas/train.formulas.norm.txt --images data/small/images_train --tokenizer dataset/tokenizer.json --out data/small/train.pkl
This only holds true while the matching is trivial.
Note: the images are in a differently sized. I've opted to pad each dimension to a multiple of 32. They chose for height: 20 and width 80.
There is the option to set pad
to true in the config file but it is much slower doing it in real time than to preprocess the images. Use https://github.com/lukas-blecher/LaTeX-OCR/blob/ba1b7285799f0ee3b78925029e7e521444974a71/utils/utils.py#L73-L104
python dataset/dataset.py --equations dataset/data/preprocessx/Data-for-LaTeX_OCR/full/formulas/train.formulas.norm.txt --images dataset/data/preprocessx/Data-for-LaTeX_OCR/full/images_train --tokenizer dataset/tokenizer.json --out dataset/data/preprocessx/Data-for-LaTeX_OCR/full/train_full.pkl
python dataset/dataset.py --equations dataset/data/preprocessx/Data-for-LaTeX_OCR/small/formulas/train.formulas.norm.txt --images dataset/data/preprocessx/Data-for-LaTeX_OCR/small/images_train --tokenizer dataset/tokenizer.json --out dataset/data/preprocessx/Data-for-LaTeX_OCR/small/train_samll.pkl
These two commands generate train_full.pkl and train_samll.pkl respectively, and their sizes are both 27576. Are the results wrong?
Screenshot of the result is as follow: (tf_1.12) @.:/home/code/LaTeX-OCR/dataset/data/preprocessx/Data-for-LaTeX_OCR/small# ll total 60 drwxr-xr-x 5 root root 4096 Dec 29 09:36 ./ drwxr-xr-x 6 root root 4096 Aug 27 2019 ../ -rw-r--r-- 1 root root 592 Aug 27 2019 README.md -rw-r--r-- 1 root root 1114 Aug 27 2019 data.json drwxr-xr-x 2 root root 4096 Aug 27 2019 formulas/ drwxr-xr-x 5 root root 4096 Aug 27 2019 images/ drwxr-xr-x 2 root root 4096 Aug 27 2019 matching/ -rw-r--r-- 1 root root 27576 Dec 29 09:36 train_small.pkl -rw-r--r-- 1 root root 174 Aug 27 2019 vocab.json (tf_1.12) @.:/home/code/LaTeX-OCR/dataset/data/preprocessx/Data-for-LaTeX_OCR/small# cd .. (tf_1.12) @.:/home/code/LaTeX-OCR/dataset/data/preprocessx/Data-for-LaTeX_OCR# cd full (tf_1.12) @.:/home/code/LaTeX-OCR/dataset/data/preprocessx/Data-for-LaTeX_OCR/full# ll total 68 drwxr-xr-x 5 root root 4096 Dec 29 13:31 ./ drwxr-xr-x 6 root root 4096 Aug 27 2019 ../ -rw-r--r-- 1 root root 6148 Aug 27 2019 .DS_Store -rw-r--r-- 1 root root 613 Aug 27 2019 README.md -rw-r--r-- 1 root root 1077 Aug 27 2019 data.json drwxr-xr-x 2 root root 4096 Aug 27 2019 formulas/ drwxr-xr-x 5 root root 4096 Aug 27 2019 images/ drwxr-xr-x 2 root root 4096 Aug 27 2019 matching/ -rw-r--r-- 1 root root 27576 Dec 29 13:31 train_full.pkl -rw-r--r-- 1 root root 173 Aug 27 2019 vocab.json (tf_1.12) @.***:/home/code/LaTeX-OCR/dataset/data/preprocessx/Data-for-LaTeX_OCR/full#
--
At 2021-12-28 21:54:13, "Lukas Blecher" @.***> wrote:
Closed #67.
— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>
They should be diffently sized if the amount of data is different. looks like the path to the images is wrong
python dataset/dataset.py --equations dataset/data/preprocessx/Data-for-LaTeX_OCR/full/formulas/train.formulas.norm.txt --images dataset/data/preprocessx/Data-for-LaTeX_OCR/full/images/images_train --tokenizer dataset/tokenizer.json --out dataset/data/preprocessx/Data-for-LaTeX_OCR/full/train_full.pkl
How to use pytorch's LBFGS algorithm in your LaTeX-OCR project?
--
At 2021-12-28 21:54:13, "Lukas Blecher" @.***> wrote:
Closed #67.
— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>
opt = optim.LBFGS(model.parameters())
Thank you,you are a great man.
"The model consist of a ViT [1] encoder with a ResNet backbone and a Transformer [2] decoder." in your latex-ocr project(https://github.com/lukas-blecher/LaTeX-OCR) Would you like to give me the reference articles( ViT [1] and Transformer [2] )?
--
At 2022-01-07 19:29:33, "Lukas Blecher" @.***> wrote:
opt=optim.LBFGS(model.parameters())
— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>
Haha thanks.
[1] and [2] are listed at the end of the readme: https://github.com/lukas-blecher/LaTeX-OCR#references
I want to convert this fullhand dataset (https://github.com/LinXueyuanStdio/Data-for-LaTeX_OCR/tree/d8dd211270746a86caf85cbe5aab93f2a4bee0df/fullhand) into a pkl file and the following error occurred. How to fix it?
C:\Users\demo\Desktop\im2latex\LaTeX-OCR-main>python dataset/dataset.py --equations C:\Users\demo\Desktop\im2latex\latex-ocr-datasets\Data-for-LaTeX_OCR\fullhand\formulas\formulas.norm.txt --images C:\Users\demo\Desktop\im2latex\latex-ocr-datasets\Data-for-LaTeX_OCR\fullhand\images --tokenizer dataset/tokenizer.json --out fullhand.pkl
Generate dataset
0%| | 5/99552 [00:00<3:58:57, 6.94it/s]
Traceback (most recent call last):
File "dataset/dataset.py", line 247, in
Im2LatexDataset(args.equations, args.images, args.tokenizer).save(args.out)
File "dataset/dataset.py", line 101, in init
self.data[(width, height)].append((eqs[self.indices[i]], im))
IndexError: list index out of range
C:\Users\demo\Desktop\im2latex\LaTeX-OCR-main>
--
At 2022-01-08 20:57:48, "Lukas Blecher" @.***> wrote:
Haha thanks.
[1] and [2] are listed at the end of the readme: https://github.com/lukas-blecher/LaTeX-OCR#references
— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>
This is because the matching is not trivial. you would need to create a lookup table, like so
def read_matches(line):
img, ind = line.split(' ')
img = int(img.split('.')[0])
ind = int(ind)
return img, ind
with open('training.matching.txt', 'r') as f:
imgs, inds = [], []
for line in f.readlines():
img, ind = read_matches(line)
imgs.append(img)
inds.append(ind)
and in the dataset file you need to use that information
ind = inds[imgs.index(self.indices[i])]
self.data[(width, height)].append((eqs[ind], im))
I've not tested it, so there can be mistakes. But that's the direction you have to go
Would you like to add tensorboard to your project(lukas-blecher/LaTeX-OCR)? In order to visualize the running process and results.
--
At 2022-01-08 20:57:48, "Lukas Blecher" @.***> wrote:
Haha thanks.
[1] and [2] are listed at the end of the readme: https://github.com/lukas-blecher/LaTeX-OCR#references
— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>
I already have a weights and biases integration in place which you also can host locally.
How should I run this project in order to visualize the deep network structure used?
--
At 2022-01-13 18:52:55, "Lukas Blecher" @.***> wrote:
I already have a weights and biases integration in place which you also can host locally.
— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>
I don't know. there is nothing in place for that in this project.
@.***:~/LaTeX-OCR-main# python gui.py
qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.
Available platform plugins are: eglfs, linuxfb, minimal, minimalegl, offscreen, vnc, wayland-egl, wayland, wayland-xcomposite-egl, wayland-xcomposite-glx, webgl, xcb.
Aborted (core dumped)
@.***:~/LaTeX-OCR-main#
--
At 2022-01-13 18:52:55, "Lukas Blecher" @.***> wrote:
I already have a weights and biases integration in place which you also can host locally.
— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>
https://www.cnblogs.com/keng333/p/14328144.html apt-get install libxcb-xinerama0 export QTWEBENGINE_DISABLE_SANDBOX=1 export XDG_RUNTIME_DIR=/usr/lib/ export RUNLEVEL=3
--
At 2022-01-15 00:55:47, "Lukas Blecher" @.***> wrote:
I don't know. there is nothing in place for that in this project.
— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>
The formula in the attachment cannot be recognized as latex code. How to add new formula data to the original data set?
--
At 2022-01-09 20:09:49, "Lukas Blecher" @.***> wrote:
This is because the matching is not trivial. you would need to create a lookup table, like so
defread_matches(line): img, ind=line.split(' ') img=int(img.split('.')[0]) ind=int(ind) returnimg, indwithopen('training.matching.txt', 'r') asf: imgs, inds= [], [] forlineinf.readlines(): img, ind=read_matches(line) imgs.append(img) inds.append(ind)
and in the dataset file you need to use that information
ind=inds[imgs.index(self.indices[i])] self.data[(width, height)].append((eqs[ind], im))
I've not tested it, so there can be mistakes. But that's the direction you have to go
— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>
With 464e4fc you can combine multiple datasets or generate one combined pkl file:
python dataset/dataset.py --equations dataset1/formulas.txt dataset2/formulas.txt --images dataset1/images dataset2/images --tokenizer dataset/tokenizer.json --out combined.pkl
Also the attachment does not transfer over to github.
On the same data set (formulae), using different training algorithms in pytorch to train one after another, why is the final Bleu result always 0? ........ BLEU: 0.000, ED: 2.80e+00: 21%|████████████▌ | 80/389 [3:49:17<14:45:37, 171.97s/it] BLEU: 0.000, ED: 3.26e+00: 21%|████████████▌ | 80/389 [3:31:05<13:35:22, 158.32s/it] BLEU: 0.000, ED: 3.13e+00: 21%|████████████▌ | 80/389 [3:40:22<14:11:13, 165.29s/it] BLEU: 0.000, ED: 2.84e+00: 21%|████████████▌ | 80/389 [4:10:24<16:07:12, 187.81s/it] BLEU: 0.000, ED: 3.19e+00: 21%| .........
--
At 2022-01-09 20:09:49, "Lukas Blecher" @.***> wrote:
This is because the matching is not trivial. you would need to create a lookup table, like so
defread_matches(line): img, ind=line.split(' ') img=int(img.split('.')[0]) ind=int(ind) returnimg, indwithopen('training.matching.txt', 'r') asf: imgs, inds= [], [] forlineinf.readlines(): img, ind=read_matches(line) imgs.append(img) inds.append(ind)
and in the dataset file you need to use that information
ind=inds[imgs.index(self.indices[i])] self.data[(width, height)].append((eqs[ind], im))
I've not tested it, so there can be mistakes. But that's the direction you have to go
— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>
What are you showing me? That looks like the eval output. And why does it take so long for one iteration? What's your batch size? For large scale training you basically need a gpu. I don't know what you are doing but the model is not learning anything.
If there is already voc.txt, how to turn it into tokenizer.json ?
--
At 2022-01-09 20:09:49, "Lukas Blecher" @.***> wrote:
This is because the matching is not trivial. you would need to create a lookup table, like so
defread_matches(line): img, ind=line.split(' ') img=int(img.split('.')[0]) ind=int(ind) returnimg, indwithopen('training.matching.txt', 'r') asf: imgs, inds= [], [] forlineinf.readlines(): img, ind=read_matches(line) imgs.append(img) inds.append(ind)
and in the dataset file you need to use that information
ind=inds[imgs.index(self.indices[i])] self.data[(width, height)].append((eqs[ind], im))
I've not tested it, so there can be mistakes. But that's the direction you have to go
— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>
? A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ! # \, \/ \: \Big \Bigg \Biggl \Biggr \Bigl \Bigr \Delta \Gamma \Im \L \Lambda \Large \Leftrightarrow \Longleftrightarrow \Longrightarrow \O \Omega \P \Phi \Pi \Psi \Re \Rightarrow \S \Sigma \Theta \Upsilon \Vert \Xi \ _ \acute \aleph \alpha \approx \arccos \arcsin \arctan \arg \ast \atop \b \backslash \bar \begin{array} \begin{cases} \begin{matrix} \begin{picture} \beta \bf \big \bigcap \bigcup \bigg \biggl \biggr \bigl \bigoplus \bigotimes \bigr \bigtriangledown \bigtriangleup \bigwedge \binom \bmod \boldmath \bot \breve \buildrel \bullet \cal \cap \cdot \cdotp \cdots \check \chi \circ \circle \colon \cong \cos \cosh \cot \coth \cup \d \dag \dagger \ddot \ddots \deg \delta \det \diamond \diamondsuit \dim \displaystyle \dot \doteq \dots \downarrow \ell \emptyset \end{array} \end{cases} \end{matrix} \end{picture} \enskip \enspace \epsilon \equiv \eta \exp \fbox \flat \footnotesize \forall \frac \gamma \ge \geq \gg \hat \hbar \hfill \hline \hookrightarrow \hspace \i \imath \in \infty \int \iota \it \jmath \kappa \kern \l \label \lambda \land \langle \large \lbrace \lbrack \ldots \le \left( \left. \left< \left[ \left\langle \left\lbrack \left\vert \left{ \left\ \leftarrow \leftrightarrow \left \leq \lfloor \lim \line \ll \llap \ln \log \longleftrightarrow \longmapsto \longrightarrow \makebox \mapsto \mathbf \mathcal \mathit \mathop \mathrm \mathsf \max \mid \min \mit \mp \mu \nabla \natural \ne \neq \ni \noalign \nonumber \not \nu \o \odot \oint \omega \ominus \oplus \otimes \overbrace \overleftarrow \overline \overrightarrow \parallel \partial \perp \phantom \phi \pi \pm \pounds \prime \prod \propto \protect \psi \put \qquad \quad \raise \raisebox \rangle \rbrace \rbrack \ref \rfloor \rho \right) \right. \right> \right\rangle \right\rbrack \right\vert \right\ \right} \right] \rightarrow \rightharpoonup \right \rlap \sb \scriptscriptstyle \scriptsize \scriptstyle \sec \setlength \sf \sharp \sigma \sim \simeq \sin \sinh \sl \slash \small \smallskip \sp \space \sqrt \stackrel \star \strut \subset \subseteq \sum \sup \supset \tan \tanh \tau \textbf \textrm \textstyle \textup \theta \thinspace \tilde \times \tiny \to \triangle \tt \underbrace \underline \unitlength \uparrow \upsilon \varepsilon \varphi \varpi \varrho \varsigma \vartheta \vdots \vec \vee \vert \vline \vphantom \vspace \wedge \widehat \widetilde \wp \xi \zeta { \ } ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { } ~
What is the content of the pkl file? I created a pkl file on one machine, and then put the pkl file on another machine for training. The training machine only has the pkl file, and there is no corresponding image and latex formula text. Is this correct?
--
At 2022-01-09 20:09:49, "Lukas Blecher" @.***> wrote:
This is because the matching is not trivial. you would need to create a lookup table, like so
defread_matches(line): img, ind=line.split(' ') img=int(img.split('.')[0]) ind=int(ind) returnimg, indwithopen('training.matching.txt', 'r') asf: imgs, inds= [], [] forlineinf.readlines(): img, ind=read_matches(line) imgs.append(img) inds.append(ind)
and in the dataset file you need to use that information
ind=inds[imgs.index(self.indices[i])] self.data[(width, height)].append((eqs[ind], im))
I've not tested it, so there can be mistakes. But that's the direction you have to go
— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>
Different tokenizer.json files are generated through different latex formula texts. How to combine different tokenizer.json files into one tokenizer.json?
--
At 2022-01-09 20:09:49, "Lukas Blecher" @.***> wrote:
This is because the matching is not trivial. you would need to create a lookup table, like so
defread_matches(line): img, ind=line.split(' ') img=int(img.split('.')[0]) ind=int(ind) returnimg, indwithopen('training.matching.txt', 'r') asf: imgs, inds= [], [] forlineinf.readlines(): img, ind=read_matches(line) imgs.append(img) inds.append(ind)
and in the dataset file you need to use that information
ind=inds[imgs.index(self.indices[i])] self.data[(width, height)].append((eqs[ind], im))
I've not tested it, so there can be mistakes. But that's the direction you have to go
— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>
What is the content of the pkl file? I created a pkl file on one machine, and then put the pkl file on another machine for training. The training machine only has the pkl file, and there is no corresponding image and latex formula text. Is this correct?
The pkl file only contains the relative path to the images but does save the equation. So you will need to do recompile the pkl file on each machine and you need the images. To the other questions I can't give answers. I'm using huggingface tokenizers so you'll need to look there for more information.
In your laxtex-ocr project, How to use lr_scheduler.CosineAnnealingWarmRestarts in pytorch for learning rate adjustment (https://pytorch.org/docs/stable/optim.html#optimizer-step-closure) ? How to use lr_scheduler.ChainedScheduler in pytorch for learning rate adjustment (https://pytorch.org/docs/stable/optim.html#optimizer-step-closure) ?
--
At 2021-12-29 20:59:48, "Lukas Blecher" @.***> wrote:
They should be diffently sized if the amount of data is different. looks like the path to the images is wrong
python dataset/dataset.py --equations dataset/data/preprocessx/Data-for-LaTeX_OCR/full/formulas/train.formulas.norm.txt --images dataset/data/preprocessx/Data-for-LaTeX_OCR/full/images/images_train --tokenizer dataset/tokenizer.json --out dataset/data/preprocessx/Data-for-LaTeX_OCR/full/train_full.pkl
— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>
Can your project provide the API for easy calling?
like following https://github.com/aspnetcs/image-to-latex-main
--
At 2021-12-29 20:59:48, "Lukas Blecher" @.***> wrote:
They should be diffently sized if the amount of data is different. looks like the path to the images is wrong
python dataset/dataset.py --equations dataset/data/preprocessx/Data-for-LaTeX_OCR/full/formulas/train.formulas.norm.txt --images dataset/data/preprocessx/Data-for-LaTeX_OCR/full/images/images_train --tokenizer dataset/tokenizer.json --out dataset/data/preprocessx/Data-for-LaTeX_OCR/full/train_full.pkl
— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>
I've added a similar API now
similar API,which is include ......?
Would you like to describe it in detail? Can these interfaces be exposed and then called, similar to mathpix?
--
At 2022-05-01 16:18:16, "Lukas Blecher" @.***> wrote:
I've added a similar API now
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
I don't know about the mathpix api. There is an api running and you can connect to it via a streamlit demo, like in https://github.com/kingyiusuen/image-to-latex You can find more info in the readme
It's a miracle and it's so well done!!! Would you like to make a front end, similar to mathpix, or (https://github.com/lukas-blecher/LaTeX-OCR) see attachments that can be intercepted for formula recognition. How do I extract formulas from a latex file to augment a dataset?
--
At 2022-05-01 17:42:11, "Lukas Blecher" @.***> wrote:
I don't know about the mathpix api. There is an api running and you can connect to it via a streamlit demo, like in https://github.com/kingyiusuen/image-to-latex You can find more info in the readme
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
Sorry, kingyiusuen/image-to-latex: Convert images of LaTex math equations into LaTex code. (github.com)
lukas-blecher/LaTeX-OCR: pix2tex: Using a ViT to convert images of equations into LaTeX code. (github.com)
Can you put the above two projects together?
How to extract formulas from latex to augment a dataset?
--
At 2022-05-01 17:42:11, "Lukas Blecher" @.***> wrote:
I don't know about the mathpix api. There is an api running and you can connect to it via a streamlit demo, like in https://github.com/kingyiusuen/image-to-latex You can find more info in the readme
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
I don't know what you mean by combining the projects.
you can extract equations from the latex source by using the script arxiv.py
in pix2tex.dataset.arxiv
.
combining the projects
What I mean is that it is best for your project (https://github.com/lukas-blecher/LaTeX-OCR) to also provide an api interface, just like this project (https://github.com/kingyiusuen/image-to-latex), it is easy to call.
I just found out that your project has already implemented this function.
you are so great!
Can you make a function that uses the key to call the interface? Similar to mathpix, kedaxunfei (https://www.xfyun.cn/doc/words/formula-discern/API.html#%E6%8E%A5%E5%8F%A3%E8%B0%83%E7%94 %A8%E6%B5%81%E7%A8%8B) similar like the following
appid xxxxxx apisecret xxxxxxxxxxxxxxxxxxxx apikey xxxxxxxxxxxxx
--
At 2022-05-01 19:39:51, "Lukas Blecher" @.***> wrote:
I don't know what you mean by combining the projects. you can extract equations from the latex source by using the script arxiv.py in pix2tex.dataset.arxiv.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
I don't see a reason to implement this functionality. I am not planning to deploy the API. It is meant as a local, self hosted interaction point.
If you need this, you will have to implement it yourself.
It's hard for me to complete this function myself, because I don't have the ability, I don't have the time, hey
--
At 2022-05-03 18:13:18, "Lukas Blecher" @.***> wrote:
I don't see a reason to implement this functionality. I am not planning to deploy the API. It is meant as a local, self hosted interaction point.
If you need this, you will have to implement it yourself.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
how to create new dataset for testing?