ilius / pyglossary

A tool for converting dictionary files aka glossaries. Mainly to help use our offline glossaries in any Open Source dictionary we like on any modern operating system / device.
GNU General Public License v3.0
2.26k stars 237 forks source link

Feature: add web UI #596

Open glowinthedark opened 1 day ago

glowinthedark commented 1 day ago

@ilius: Did you consider adding a webui fallback as a friendlier option than the cli interface in case desktop guis are not supported?

Btw, as of python3.13 apparently Tix is not included anymore with tkinter, apparently because it has become unmaintained and has security issues, so it looks like the tk UI will become unsupported unless you decouple it from Tix.

I've played a bit with streamlit and here is a crude and untested POC for a web UI: https://github.com/glowinthedark/pyglossary_st

if all the other pyglossary modules are already installed then it can be run with:

pip install streamlit
streamlit run pyglossary_st.py

or if you have uv then with:

uv run streamlit run pyglossary_st.py

Let me know your thoughts about it, and whether you would consider a pull request for adding streamlit as yet another UI.

ilius commented 1 day ago

Nice work! Very exciting.

There is one problem though. This currently allows anyone in your local network to read or override any file (that your user has access) knowing its path, and create new files in any directory! That has serious security implications.

So we have two options:

Once you fix that, you can add a PR.

Another thing - which is not a blocker - is user experience: Mainly, how does the user install streamlit and its dependencies? (there are so many)

Several scenarios:

Until we figure this out, I can't see it act as "the fallback/default UI".

We can continue this later in a discussion page.

glowinthedark commented 1 day ago

To prevent streamlit from binding to 0.0.0.0 there is a config line that needs to be added to ~/.streamlit/config.toml or to .streamlit/config.toml in the project:

[browser]
serverAddress = '127.0.0.1'

Then it will only be accessible from http://localhost:8501 and not to the entire LAN, the port can also be overridden with

[server]
port = 8501

As for for all the other points you raised, in fact, I had the same concerns as you do.

Originally I tried using st.file_uploader, but it has an upload file size limit of 200MB, and dictionaries often are bigger than that. Yet that might be the only way to easily webify pyglossary as a web app which can then be published for free on streamlit's own cloud (examples), and could serve as a limited size demo. I suspect it might even be deployed as a cloudflare worker, and run on cloudflare infrastructure (it's available on free plans), but didn't try.

The number of dependencies that streamlit requires is indeed a lot, and the API model didn't really convince me, as it imposes its own opinionated approach. It was more of an experiment given that I've never used streamlit before, and felt that trying to make something more than just a hello world is the best way. On the other hand, it was also an entry point to open a discussion about what would be an ideal scenario.

Also had a look a google's mesop — seems to be a competitor for streamlit, apparently with a more solid API design, still quite new, it's built on top of fastapi which is a full blown web framework, although it is probably the lightest and the best one out there today. Still.. that's way too many dependencies for such a simple use case.

Both streamlit and mesop might be overkill, so probably it needs some longer and deeper consideration.

The question I had regarding streamlit was 'is it doable?', and the answer is yes, and it took record time to get it done. The next question would be 'is it a good idea?', and here I have doubts — easy is good, but heavy, opinionated and lock-in to an API/framework is probably not so good, wdyt?

A minimalistic clean approach with zero dependencies might be python's build-in dev server, e.g. using the code equivalent of python -m http.server --cgi (it binds to localhost ootb, but can also be exposed to the LAN if needed, see python -m http.server --help). CGI is grandpa's technology :laughing:, and will sure take longer time for a POC and look uglier than streamlit, yet would probably be a more reasonable approach, although not yet fully convinced how feasible, and how much effort will be needed :thinking:. For example, streamlit uses web sockets for having fluid UI updates for the progress bar and to update widget states, and with the built-in python server plain HTTP will have to be used, and raw low-level js/css/html.

What do you think about it?

glowinthedark commented 1 day ago

oops, along with tkinter.tix, cgi is also being removed in python3.13:

Important removals:

  • PEP 594: The remaining 19 “dead batteries” (legacy stdlib modules) have been removed from the standard library: aifc, audioop, cgi, cgitb, chunk, crypt, imghdr, mailcap, msilib, nis, nntplib, ossaudiodev, pipes, sndhdr, spwd, sunau, telnetlib, uu and xdrlib.
  • Remove the 2to3 tool and lib2to3 module (deprecated in Python 3.11).
  • Remove the tkinter.tix module (deprecated in Python 3.6).
  • Remove the locale.resetlocale() function.
  • Remove the typing.io and typing.re namespaces.

...on the other hand, CGIHTTPRequestHandler is not being deprecated and is still there, and apparently not planned for removal, so it's fine, only the cgi module is being removed.

ilius commented 1 day ago

I wanted to mention this but forgot: we can show a simple Tkinter dialog and ask the user to install streamlit (or whatever else) for the web ui. Not using tix of course. Hopefully they don't remove tkinter from windows instalation later!

I would prefer a lightweight and local-first appeoach of course (and no use of CloudFlare or other cloud services). But as they say, "practicality beats purity".

I'm a back-end developer, and I dislike large js frameworks like react or node. I made a simple dictionary web app using Brython and Go. Have you seen Brython? It's not very popular sadly. And so I'm not sure how long will it be maintained.

BTW, instead of path entry (or alongside an Upload button) we should take a URL to download it from, and that URL can even be a file:// url that points to a local file. The only gotcha is a directory, which can be zipped, and web UI unzips it (then we need to let user enter file name inside zip?). We can worry about that later. Can you give it a try?

I will try to take a look into streamlit.

glowinthedark commented 1 day ago

the tkinter ui fails loading on macos because there is no easy way to get tkinter.Tix installed on either intel or apple silicon macs, or at least I didn't manage to do it with either 3.12 or 3.11 which supposedly should still have Tix; core tkinter itself is available and importable, and forcing tk with pyglossary --tk throws an error:

python3.12/tkinter/tix.py", line 221, in __init__
    self.tk.eval('package require Tix')
_tkinter.TclError: can't find package Tix

Never heard of Brython, but looks interesting, in the long run only time will tell where it goes. Python in the browser is still seen as something very exotic, although I've seen a few solid apps that use WASM compiled python, ie pyodide.org. Also really impressed with golang and where it is heading to but it's a different level, way over my old head :laughing:

A backender too, so trying to avoid heavy UIs although it's not always easily avoidable, especially when work is involved. Thinking of giving a try to raw HTML/js + ootb python standard library, and see how far I can get.

BTW, instead of path entry (or alongside an Upload button) we should take a URL to download it from, and that URL can even be a file:// url that points to a local file.

Don't you think throwing in URLs will be mixing scopes? Browsers are sandboxed, so they intentionally prevent even getting the full path to drag-and-dropped files, allowing only the file name to be read. The file:// protocol is considered a security risk so even though it can explicitly be whitelisted in some browsers like ff and chrome it's guaranteed to cause issues. The only safe way to use file:// would be a full-blown electron app, or something smarter like Tauri. And btw, with Tauri pyglossary could be packaged as a desktop app bundle and not be hundreds of GB in size.

Will post an update here when and if manage to get to something useable with the ootb SimpleHTTPRequestHandler.

ilius commented 1 day ago

I meant just taking a URL as text entry, and in backend: only allow file url if the request is coming from localhost (can even disable it in config). The point is that it's URL, not path. This way people can paste url of public glossaries on internet. We can even cache them and avoid downloading twice.

Like how Google Reverse Image search allows giving a URL as well as uploading an image.

ilius commented 1 day ago

If you create a website from this, adding a checkbox "This glossary can be publicly distributed" would be handy in creating a public collection! Even allowing to specify the license (which may or may not exist in metadata) would be nice!

ilius commented 1 day ago

Also take a look at https://github.com/Crissium/SilverDict

Wouldn't be it nice to be able to integrate PyGlossary into it?

glowinthedark commented 23 hours ago

impressive! goldendict new-new generation! the cpp version would probably be hard to beat in terms of full text indexing, search speed over multiple dictionaries, and number of formats, but just the fact that there is interest in the topic, and people are putting time and effort into it is already a lot, especially in these times when the 'big goo' and :robot::robot::robot: are assimilating the life forms like the borg; on mac SilverDict required building some whl wheels so users without xcode or the cli build tools will be left out; surprisingly it does actually work, although appears to be in early alpha stage;

thanks to pyglossary I've converted all the dictionaries I need from mdx/dsl to .slob and use the aar2-webui (java-based) + the android version; so far haven't seen anything easier/better, and it works decently.

ilius commented 21 hours ago

A minimalistic clean approach with zero dependencies might be python's build-in dev server, e.g. using the code equivalent of python -m http.server --cgi (it binds to localhost ootb, but can also be exposed to the LAN if needed, see python -m http.server --help). CGI is grandpa's technology 😆, and will sure take longer time for a POC and look uglier than streamlit, yet would probably be a more reasonable approach, although not yet fully convinced how feasible, and how much effort will be needed 🤔. For example, streamlit uses web sockets for having fluid UI updates for the progress bar and to update widget states, and with the built-in python server plain HTTP will have to be used, and raw low-level js/css/html.

CGI starts a new process for each request! I was lucky enough to never have used it I guess. It's definitely not for 2024! Mesop has the word AI in in 6-word summary! So that's a No from me! Streamlit is not terribly large. Streamlit, Django, Flask, web.py... they are all fine by me.

I want to release 5.0.0 soon (maybe in 1 or 2 weeks). You think we should squeeze Streamlit web UI in it?

glowinthedark commented 9 hours ago

actually, there is no need for CGI, managed to completely avoid the need for it and just use static HTML with js that pulls data from http endpoints via ajax; no overhead, tight self-contained, and no need to mix python with raw html, also learned about the existence of SSE which is simpler than websockets and allows pushing state to the client via HTTP without needing a dedicated ws:// protocol