Open glowinthedark opened 2 days ago
Nice work! Very exciting.
There is one problem though. This currently allows anyone in your local network to read or override any file (that your user has access) knowing its path, and create new files in any directory! That has serious security implications.
So we have two options:
Replace both path entries with Upload and Download buttons (use cacheDir
to store temp files)
Once you fix that, you can add a PR.
Another thing - which is not a blocker - is user experience: Mainly, how does the user install streamlit and its dependencies? (there are so many)
Several scenarios:
User reads GitHub / README, copies a few commands into command line to install dependencies then runs PyGlossary (clicks on icon or runs the command). This is fine but not great for users struggling with command line.
User runs pyglossary in command line, it does not find Gtk, Tk, prompt toolkit (for interactive cmd) or streamlit, then instead of showing command line usage, it asks if you want to install dependencies for the web version. I don't love this.
User clicks on PyGlossary icon, it does not find any of the above modules, then it starts downloading dependencies for the web version. This is both invasive and makes user think app is slow.
We publish a Windows and Mac executable that ship with streamlit and its dependencies for every PyGlossary release. I don't want to do that, but if you volunteer to do it in a new repo, I can add link to it.
Until we figure this out, I can't see it act as "the fallback/default UI".
We can continue this later in a discussion page.
To prevent streamlit from binding to 0.0.0.0
there is a config line that needs to be added to ~/.streamlit/config.toml
or to .streamlit/config.toml
in the project:
[browser]
serverAddress = '127.0.0.1'
Then it will only be accessible from http://localhost:8501 and not to the entire LAN, the port can also be overridden with
[server]
port = 8501
As for for all the other points you raised, in fact, I had the same concerns as you do.
Originally I tried using st.file_uploader, but it has an upload file size limit of 200MB, and dictionaries often are bigger than that. Yet that might be the only way to easily webify pyglossary as a web app which can then be published for free on streamlit's own cloud (examples), and could serve as a limited size demo. I suspect it might even be deployed as a cloudflare worker, and run on cloudflare infrastructure (it's available on free plans), but didn't try.
The number of dependencies that streamlit requires is indeed a lot, and the API model didn't really convince me, as it imposes its own opinionated approach. It was more of an experiment given that I've never used streamlit before, and felt that trying to make something more than just a hello world is the best way. On the other hand, it was also an entry point to open a discussion about what would be an ideal scenario.
Also had a look a google's mesop — seems to be a competitor for streamlit, apparently with a more solid API design, still quite new, it's built on top of fastapi which is a full blown web framework, although it is probably the lightest and the best one out there today. Still.. that's way too many dependencies for such a simple use case.
Both streamlit and mesop might be overkill, so probably it needs some longer and deeper consideration.
The question I had regarding streamlit was 'is it doable?', and the answer is yes, and it took record time to get it done. The next question would be 'is it a good idea?', and here I have doubts — easy is good, but heavy, opinionated and lock-in to an API/framework is probably not so good, wdyt?
A minimalistic clean approach with zero dependencies might be python's build-in dev server, e.g. using the code equivalent of python -m http.server --cgi
(it binds to localhost ootb, but can also be exposed to the LAN if needed, see python -m http.server --help
). CGI is grandpa's technology :laughing:, and will sure take longer time for a POC and look uglier than streamlit, yet would probably be a more reasonable approach, although not yet fully convinced how feasible, and how much effort will be needed :thinking:. For example, streamlit uses web sockets for having fluid UI updates for the progress bar and to update widget states, and with the built-in python server plain HTTP will have to be used, and raw low-level js/css/html.
What do you think about it?
oops, along with tkinter.tix
, cgi
is also being removed in python3.13:
Important removals:
- PEP 594: The remaining 19 “dead batteries” (legacy stdlib modules) have been removed from the standard library:
aifc
,audioop
,cgi
,cgitb
,chunk
,crypt
,imghdr
,mailcap
,msilib
,nis
,nntplib
,ossaudiodev
,pipes
,sndhdr
,spwd
,sunau
,telnetlib
,uu
andxdrlib
.- Remove the 2to3 tool and
lib2to3
module (deprecated in Python 3.11).- Remove the
tkinter.tix
module (deprecated in Python 3.6).- Remove the
locale.resetlocale()
function.- Remove the
typing.io
andtyping.re
namespaces.
classmethod
descriptors....on the other hand, CGIHTTPRequestHandler is not being deprecated and is still there, and apparently not planned for removal, so it's fine, only the cgi
module is being removed.
I wanted to mention this but forgot: we can show a simple Tkinter dialog and ask the user to install streamlit (or whatever else) for the web ui. Not using tix of course. Hopefully they don't remove tkinter from windows instalation later!
I would prefer a lightweight and local-first appeoach of course (and no use of CloudFlare or other cloud services). But as they say, "practicality beats purity".
I'm a back-end developer, and I dislike large js frameworks like react or node. I made a simple dictionary web app using Brython and Go. Have you seen Brython? It's not very popular sadly. And so I'm not sure how long will it be maintained.
BTW, instead of path entry (or alongside an Upload button) we should take a URL to download it from, and that URL can even be a file://
url that points to a local file.
The only gotcha is a directory, which can be zipped, and web UI unzips it (then we need to let user enter file name inside zip?). We can worry about that later.
Can you give it a try?
I will try to take a look into streamlit.
the tkinter ui fails loading on macos because there is no easy way to get tkinter.Tix installed on either intel or apple silicon macs, or at least I didn't manage to do it with either 3.12 or 3.11 which supposedly should still have Tix; core tkinter itself is available and importable, and forcing tk with pyglossary --tk
throws an error:
python3.12/tkinter/tix.py", line 221, in __init__
self.tk.eval('package require Tix')
_tkinter.TclError: can't find package Tix
Never heard of Brython, but looks interesting, in the long run only time will tell where it goes. Python in the browser is still seen as something very exotic, although I've seen a few solid apps that use WASM compiled python, ie pyodide.org. Also really impressed with golang and where it is heading to but it's a different level, way over my old head :laughing:
A backender too, so trying to avoid heavy UIs although it's not always easily avoidable, especially when work is involved. Thinking of giving a try to raw HTML/js + ootb python standard library, and see how far I can get.
BTW, instead of path entry (or alongside an Upload button) we should take a URL to download it from, and that URL can even be a file:// url that points to a local file.
Don't you think throwing in URLs will be mixing scopes? Browsers are sandboxed, so they intentionally prevent even getting the full path to drag-and-dropped files, allowing only the file name to be read. The file://
protocol is considered a security risk so even though it can explicitly be whitelisted in some browsers like ff and chrome it's guaranteed to cause issues. The only safe way to use file://
would be a full-blown electron app, or something smarter like Tauri. And btw, with Tauri pyglossary could be packaged as a desktop app bundle and not be hundreds of GB in size.
Will post an update here when and if manage to get to something useable with the ootb SimpleHTTPRequestHandler.
I meant just taking a URL as text entry, and in backend: only allow file url if the request is coming from localhost (can even disable it in config). The point is that it's URL, not path. This way people can paste url of public glossaries on internet. We can even cache them and avoid downloading twice.
Like how Google Reverse Image search allows giving a URL as well as uploading an image.
If you create a website from this, adding a checkbox "This glossary can be publicly distributed" would be handy in creating a public collection! Even allowing to specify the license (which may or may not exist in metadata) would be nice!
Also take a look at https://github.com/Crissium/SilverDict
Wouldn't be it nice to be able to integrate PyGlossary into it?
impressive! goldendict new-new generation! the cpp version would probably be hard to beat in terms of full text indexing, search speed over multiple dictionaries, and number of formats, but just the fact that there is interest in the topic, and people are putting time and effort into it is already a lot, especially in these times when the 'big goo' and :robot::robot::robot: are assimilating the life forms like the borg; on mac SilverDict required building some whl wheels so users without xcode or the cli build tools will be left out; surprisingly it does actually work, although appears to be in early alpha stage;
thanks to pyglossary I've converted all the dictionaries I need from mdx/dsl to .slob and use the aar2-webui (java-based) + the android version; so far haven't seen anything easier/better, and it works decently.
A minimalistic clean approach with zero dependencies might be python's build-in dev server, e.g. using the code equivalent of python -m http.server --cgi (it binds to localhost ootb, but can also be exposed to the LAN if needed, see python -m http.server --help). CGI is grandpa's technology 😆, and will sure take longer time for a POC and look uglier than streamlit, yet would probably be a more reasonable approach, although not yet fully convinced how feasible, and how much effort will be needed 🤔. For example, streamlit uses web sockets for having fluid UI updates for the progress bar and to update widget states, and with the built-in python server plain HTTP will have to be used, and raw low-level js/css/html.
CGI starts a new process for each request! I was lucky enough to never have used it I guess. It's definitely not for 2024! Mesop has the word AI in in 6-word summary! So that's a No from me! Streamlit is not terribly large. Streamlit, Django, Flask, web.py... they are all fine by me.
I want to release 5.0.0 soon (maybe in 1 or 2 weeks). You think we should squeeze Streamlit web UI in it?
actually, there is no need for CGI, managed to completely avoid the need for it and just use static HTML with js that pulls data from http endpoints via ajax; no overhead, tight self-contained, and no need to mix python with raw html, also learned about the existence of SSE which is simpler than websockets and allows pushing state to the client via HTTP without needing a dedicated ws://
protocol
@ilius: Did you consider adding a webui fallback as a friendlier option than the cli interface in case desktop guis are not supported?
Btw, as of python3.13 apparently Tix is not included anymore with tkinter, apparently because it has become unmaintained and has security issues, so it looks like the tk UI will become unsupported unless you decouple it from Tix.
I've played a bit with streamlit and here is a crude and untested POC for a web UI: https://github.com/glowinthedark/pyglossary_st
if all the other pyglossary modules are already installed then it can be run with:
or if you have
uv
then with:Let me know your thoughts about it, and whether you would consider a pull request for adding streamlit as yet another UI.