ilius / pyglossary

A tool for converting dictionary files aka glossaries. Mainly to help use our offline glossaries in any Open Source dictionary we like on any modern operating system / device.
GNU General Public License v3.0
2.26k stars 237 forks source link

Feature: add web UI #596

Open glowinthedark opened 3 days ago

glowinthedark commented 3 days ago

@ilius: Did you consider adding a webui fallback as a friendlier option than the cli interface in case desktop guis are not supported?

Btw, as of python3.13 apparently Tix is not included anymore with tkinter, apparently because it has become unmaintained and has security issues, so it looks like the tk UI will become unsupported unless you decouple it from Tix.

I've played a bit with streamlit and here is a crude and untested POC for a web UI: https://github.com/glowinthedark/pyglossary_st

if all the other pyglossary modules are already installed then it can be run with:

pip install streamlit
streamlit run pyglossary_st.py

or if you have uv then with:

uv run streamlit run pyglossary_st.py

Let me know your thoughts about it, and whether you would consider a pull request for adding streamlit as yet another UI.

ilius commented 3 days ago

Nice work! Very exciting.

There is one problem though. This currently allows anyone in your local network to read or override any file (that your user has access) knowing its path, and create new files in any directory! That has serious security implications.

So we have two options:

Once you fix that, you can add a PR.

Another thing - which is not a blocker - is user experience: Mainly, how does the user install streamlit and its dependencies? (there are so many)

Several scenarios:

Until we figure this out, I can't see it act as "the fallback/default UI".

We can continue this later in a discussion page.

glowinthedark commented 3 days ago

To prevent streamlit from binding to 0.0.0.0 there is a config line that needs to be added to ~/.streamlit/config.toml or to .streamlit/config.toml in the project:

[browser]
serverAddress = '127.0.0.1'

Then it will only be accessible from http://localhost:8501 and not to the entire LAN, the port can also be overridden with

[server]
port = 8501

As for for all the other points you raised, in fact, I had the same concerns as you do.

Originally I tried using st.file_uploader, but it has an upload file size limit of 200MB, and dictionaries often are bigger than that. Yet that might be the only way to easily webify pyglossary as a web app which can then be published for free on streamlit's own cloud (examples), and could serve as a limited size demo. I suspect it might even be deployed as a cloudflare worker, and run on cloudflare infrastructure (it's available on free plans), but didn't try.

The number of dependencies that streamlit requires is indeed a lot, and the API model didn't really convince me, as it imposes its own opinionated approach. It was more of an experiment given that I've never used streamlit before, and felt that trying to make something more than just a hello world is the best way. On the other hand, it was also an entry point to open a discussion about what would be an ideal scenario.

Also had a look a google's mesop — seems to be a competitor for streamlit, apparently with a more solid API design, still quite new, it's built on top of fastapi which is a full blown web framework, although it is probably the lightest and the best one out there today. Still.. that's way too many dependencies for such a simple use case.

Both streamlit and mesop might be overkill, so probably it needs some longer and deeper consideration.

The question I had regarding streamlit was 'is it doable?', and the answer is yes, and it took record time to get it done. The next question would be 'is it a good idea?', and here I have doubts — easy is good, but heavy, opinionated and lock-in to an API/framework is probably not so good, wdyt?

A minimalistic clean approach with zero dependencies might be python's build-in dev server, e.g. using the code equivalent of python -m http.server --cgi (it binds to localhost ootb, but can also be exposed to the LAN if needed, see python -m http.server --help). CGI is grandpa's technology :laughing:, and will sure take longer time for a POC and look uglier than streamlit, yet would probably be a more reasonable approach, although not yet fully convinced how feasible, and how much effort will be needed :thinking:. For example, streamlit uses web sockets for having fluid UI updates for the progress bar and to update widget states, and with the built-in python server plain HTTP will have to be used, and raw low-level js/css/html.

What do you think about it?

glowinthedark commented 3 days ago

oops, along with tkinter.tix, cgi is also being removed in python3.13:

Important removals:

  • PEP 594: The remaining 19 “dead batteries” (legacy stdlib modules) have been removed from the standard library: aifc, audioop, cgi, cgitb, chunk, crypt, imghdr, mailcap, msilib, nis, nntplib, ossaudiodev, pipes, sndhdr, spwd, sunau, telnetlib, uu and xdrlib.
  • Remove the 2to3 tool and lib2to3 module (deprecated in Python 3.11).
  • Remove the tkinter.tix module (deprecated in Python 3.6).
  • Remove the locale.resetlocale() function.
  • Remove the typing.io and typing.re namespaces.

...on the other hand, CGIHTTPRequestHandler is not being deprecated and is still there, and apparently not planned for removal, so it's fine, only the cgi module is being removed.

ilius commented 3 days ago

I wanted to mention this but forgot: we can show a simple Tkinter dialog and ask the user to install streamlit (or whatever else) for the web ui. Not using tix of course. Hopefully they don't remove tkinter from windows instalation later!

I would prefer a lightweight and local-first appeoach of course (and no use of CloudFlare or other cloud services). But as they say, "practicality beats purity".

I'm a back-end developer, and I dislike large js frameworks like react or node. I made a simple dictionary web app using Brython and Go. Have you seen Brython? It's not very popular sadly. And so I'm not sure how long will it be maintained.

BTW, instead of path entry (or alongside an Upload button) we should take a URL to download it from, and that URL can even be a file:// url that points to a local file. The only gotcha is a directory, which can be zipped, and web UI unzips it (then we need to let user enter file name inside zip?). We can worry about that later. Can you give it a try?

I will try to take a look into streamlit.

glowinthedark commented 3 days ago

the tkinter ui fails loading on macos because there is no easy way to get tkinter.Tix installed on either intel or apple silicon macs, or at least I didn't manage to do it with either 3.12 or 3.11 which supposedly should still have Tix; core tkinter itself is available and importable, and forcing tk with pyglossary --tk throws an error:

python3.12/tkinter/tix.py", line 221, in __init__
    self.tk.eval('package require Tix')
_tkinter.TclError: can't find package Tix

Never heard of Brython, but looks interesting, in the long run only time will tell where it goes. Python in the browser is still seen as something very exotic, although I've seen a few solid apps that use WASM compiled python, ie pyodide.org. Also really impressed with golang and where it is heading to but it's a different level, way over my old head :laughing:

A backender too, so trying to avoid heavy UIs although it's not always easily avoidable, especially when work is involved. Thinking of giving a try to raw HTML/js + ootb python standard library, and see how far I can get.

BTW, instead of path entry (or alongside an Upload button) we should take a URL to download it from, and that URL can even be a file:// url that points to a local file.

Don't you think throwing in URLs will be mixing scopes? Browsers are sandboxed, so they intentionally prevent even getting the full path to drag-and-dropped files, allowing only the file name to be read. The file:// protocol is considered a security risk so even though it can explicitly be whitelisted in some browsers like ff and chrome it's guaranteed to cause issues. The only safe way to use file:// would be a full-blown electron app, or something smarter like Tauri. And btw, with Tauri pyglossary could be packaged as a desktop app bundle and not be hundreds of GB in size.

Will post an update here when and if manage to get to something useable with the ootb SimpleHTTPRequestHandler.

ilius commented 3 days ago

I meant just taking a URL as text entry, and in backend: only allow file url if the request is coming from localhost (can even disable it in config). The point is that it's URL, not path. This way people can paste url of public glossaries on internet. We can even cache them and avoid downloading twice.

Like how Google Reverse Image search allows giving a URL as well as uploading an image.

ilius commented 3 days ago

If you create a website from this, adding a checkbox "This glossary can be publicly distributed" would be handy in creating a public collection! Even allowing to specify the license (which may or may not exist in metadata) would be nice!

ilius commented 3 days ago

Also take a look at https://github.com/Crissium/SilverDict

Wouldn't be it nice to be able to integrate PyGlossary into it?

glowinthedark commented 2 days ago

impressive! goldendict new-new generation! the cpp version would probably be hard to beat in terms of full text indexing, search speed over multiple dictionaries, and number of formats, but just the fact that there is interest in the topic, and people are putting time and effort into it is already a lot, especially in these times when the 'big goo' and :robot::robot::robot: are assimilating the life forms like the borg; on mac SilverDict required building some whl wheels so users without xcode or the cli build tools will be left out; surprisingly it does actually work, although appears to be in early alpha stage;

thanks to pyglossary I've converted all the dictionaries I need from mdx/dsl to .slob and use the aar2-webui (java-based) + the android version; so far haven't seen anything easier/better, and it works decently.

ilius commented 2 days ago

A minimalistic clean approach with zero dependencies might be python's build-in dev server, e.g. using the code equivalent of python -m http.server --cgi (it binds to localhost ootb, but can also be exposed to the LAN if needed, see python -m http.server --help). CGI is grandpa's technology 😆, and will sure take longer time for a POC and look uglier than streamlit, yet would probably be a more reasonable approach, although not yet fully convinced how feasible, and how much effort will be needed 🤔. For example, streamlit uses web sockets for having fluid UI updates for the progress bar and to update widget states, and with the built-in python server plain HTTP will have to be used, and raw low-level js/css/html.

CGI starts a new process for each request! I was lucky enough to never have used it I guess. It's definitely not for 2024! Mesop has the word AI in in 6-word summary! So that's a No from me! Streamlit is not terribly large. Streamlit, Django, Flask, web.py... they are all fine by me.

I want to release 5.0.0 soon (maybe in 1 or 2 weeks). You think we should squeeze Streamlit web UI in it?

glowinthedark commented 2 days ago

actually, there is no need for CGI, managed to completely avoid the need for it and just use static HTML with js that pulls data from http endpoints via ajax; no overhead, tight self-contained, and no need to mix python with raw html, also learned about the existence of SSE which is simpler than websockets and allows pushing state to the client via HTTP without needing a dedicated ws:// protocol

glowinthedark commented 4 hours ago

@ilius: please give it a try, and let me know what you think; it's a POC for pure HTML/JS/CSS with zero dependencies using the standard library http.server:

there are a few hundreds of lines borrowed from a minimalistic websockets implementation which had to be tweaked to allow running both the websockets and regular HTTP server on the same instance and on the same port; usually two server instances are needed

the code might need some cleanup, and some reorganization, I didn't put too much thought into structuring the code well, the main goal was to make a working prototype; so it's a bit of a all-in-one spaghetti, also there are some leftover classes that might not be required anymore and need to be removed

what works ✅ :

what does not work :x:

Please let me know if there are any specific issues you think might need to be addressed before making a pull request.

Btw, python-lzo and pyicu have always been a pain to get compiled on all OS's I've tried, so I've added my own WHL versions (built with github actions) to make installation easier, e.g. with uv pip or pip:

pip install --extra-index-url https://glowinthedark.github.io/python-lzo/ python-lzo
pip install --extra-index-url https://glowinthedark.github.io/pyicu-build pyicu

should also work with the tweaked requirements.txt which should be able to pick up a suitable WHL and skip compiling from source:

pip install -r requirements.txt

And a side note for the future, in case it will ever come to it — the need for a 'native' websocket implementation in 700+ lines of code could have been removed by using something like aiohttp, but the goal was to reduce the number of dependencies to zero 😆

with aiohttp the web+websockets server could have been reduced to just a few lines:

#!/usr/bin/env python3
# /// script
# dependencies = [
#     "aiohttp",
# ]
# ///

import aiohttp
from aiohttp import web
import asyncio
from aiohttp import WSMessage

async def http_handler(request):
    return web.Response(text='Hello, HTTP GET METHOD world')

async def websocket_handler(request):
    ws = web.WebSocketResponse()
    await ws.prepare(request)

    msg: WSMessage
    async for msg in ws:
        print(msg)
        if msg.type == aiohttp.WSMsgType.TEXT:
            if msg.data == 'close':
                await ws.close()
            else:
                await ws.send_str(f'REPLY: "{msg.data.upper()}"')
        elif msg.type == aiohttp.WSMsgType.ERROR:
            print('ws connection closed with exception %s' % ws.exception())

    return ws

def create_runner():
    app = web.Application()
    app.add_routes([
        web.get('/',   http_handler),
        web.get('/ws', websocket_handler),
    ])
    return web.AppRunner(app)

async def start_server(host="127.0.0.1", port=9001):
    runner = create_runner()
    await runner.setup()
    site = web.TCPSite(runner, host, port)
    await site.start()

if __name__ == "__main__":
    loop = asyncio.get_event_loop()
    loop.run_until_complete(start_server())
    loop.run_forever()

only including it here for reference, right now I'd say it's overkill adding an additional full-blown network stack on top of what's already in the standard lib, but who knows how things might change in the future; in any case, the currently used code from python-websocket-server seems to work well with pyglossary, and I didn't find any issues with it, apart from having to refactor a bit http header parsing in order to have a 2-in-1 server. websocket mode is detected by the presence of the special upgrade header and the request path of /ws.

ilius commented 2 hours ago

Nice!

Not of fan pico.pumpkin css (specially light orange on black hurts my sensitive eyes). I'd say we don't need a CSS. Just make the word "Convert" much bigger (like 2x) in the button.

Also this:

I meant just taking a URL as text entry, and in backend: only allow file url if the request is coming from localhost (can even disable it in config). The point is that it's URL, not path. This way people can paste url of public glossaries on internet. We can even cache them and avoid downloading twice.

Like how Google Reverse Image search allows giving a URL as well as uploading an image.

And run ruff format and ruff check --fix with the ruff config that we are using: https://github.com/ilius/pyglossary/blob/master/pyproject.toml#L9

Then you can integrate it with PyGlossary. Just use a directory for pyglossary.ui.ui_web package (unlike existing ui_ modules)

glowinthedark commented 2 hours ago

Not of fan pico.pumpkin.css

which shade do you think would be more neutral and easy on the eyes? sand for example — https://picocss.com/docs/version-picker/sand Update: replaced with zinc

no css would mean styling it by hand, and that's gonna be ugly; from picocss the progress bar for example comes for free, as well as the layout, typography and extra goodies

And run ruff format and ruff check --fix with the ruff config that we are using: https://github.com/ilius/pyglossary/blob/master/pyproject.toml#L9

ok, thanks! will do, also will have to review better all the code and remove redundant stuff; and also do a bit more testing, so far didn't see any specific issues, maybe need to also check non-ascii file names, etc as it might be that somewhere something needs to be escaped or unescaped.

I meant just taking a URL as text entry, and in backend: only allow file url if the request is coming from localhost (can even disable it in config). The point is that it's URL, not path. This way people can paste url of public glossaries on internet. We can even cache them and avoid downloading twice.

Aha, I think I see what you mean, sorry if I'm too slow.. 🐌 😄 Right now there is nothing preventing the users to paste a URL instead of a local path. After all the UI doesn't do anything else than just forward the parameters to Glossary.convert(ConvertArgs(data)). The client will only work on the localhost and be inaccessible from other LAN/WAN hosts because the socket is bound to 127.0.0.1 https://github.com/glowinthedark/pyglossary_web/blob/master/server_ws_http.py#L733

To make it visible on LAN/WAN users will need to manually modify it in the code to 0.0.0.0 or to the effective IP address.

As it is configured now in the server code self.client_address[0] will always return 127.0.0.1.

ilius commented 1 hour ago

Then please find a css with semi-dark green (like this GitHub's Comment button) as accent color. Also make the Convert button not colored, and make its text bigger instead (maybe bold as well).