ATF files, bad token forgiveness

wanderingstan commented 3 years ago

Hi Tom!

I forked the repo and started on some features for myself, as an amateur interested in recovering the actual cuneiform from transliterations, per this reddit thread: https://www.reddit.com/r/asklinguistics/comments/jvnaz0/linebylinesidebyside_translation_of_epic_of/

My fork is here: https://github.com/wanderingstan/cuneifyplus

With demo server here: https://cuneify.herokuapp.com/

Biggest changes:

Replace unrecognized token with string instead of erroring out
Parse ATF formttted-files intelligently
In UI, show results on same page as input form (no need to "go back")

I was about to do pull request when i saw you've been making changes too! I'd be interested to collaborate. If you're open, let me know what you think of my changes and I can merge your latest changes.

tpgillam commented 3 years ago

Hi Stan!

So I should begin by saying that my knowledge of Babylonian is essentially non-existent, I wrote this little website to help a friend who is an academic in this field (Martin Worthington) -- the features implemented were effectively the minimal set required for his use :-) [To assist with teaching originally, then more recently a book he has been working on]

One thing that I'm wondering is if you need a web interface at all? I had actually thought about refactoring out the main cuneification code into a library and putting it on pypi. I hadn't done so primarily because I thought there wouldn't be interest... but given your message maybe there is!

When it comes to web interface... I would much rather have used something like Flask rather than mess around with doing everything manually here, but I was heavily constrained by the environment I was asked to run it in. So I'm nervous about making any changes to the existing interface (beyond maintenance), as it already seems inherently quite fragile. but if an improved one (hosted elsewhere) were useful then perhaps we should start over? In particular, if you are even slightly proficient at web development then you'll do a better job than me in this regard!

Sorry I realise I didn't exactly answer your questions directly.... :-)

Tom

tpgillam commented 3 years ago

Maybe just a couple of more specific thoughts:

* Replace unrecognized token with string instead of erroring out

At a low level I think this should definitely throw -- maybe in a higher level of abstraction (e.g. in the context of rendering a whole block) then something like this could be done. (I haven't had a change to look at your change in detail, sorry)

* Parse ATF formttted-files intelligently

Sounds very useful!

* In UI, show results on same page as input form (no need to "go back")

Also sounds v. handy, though depending on scope of change this possibly I'd not want to merge it into this specific repo, which is effectively the feeder for the slightly awkward-to-maintain site as mentioned above.

tpgillam commented 3 years ago

And finally, thanks very much for your interest and reaching out! :-)

rillian commented 3 years ago

Hey, thanks for putting up a new instance of cuneify! I do like the UI change.

For what it's worth, I think making the cuneify part available as a package on pypi is a great idea. A few years ago I built a little demo (now bitrotted) to use the scaife reader interface on atf data from the cdli. I always wanted to add tabs with a cuneified version of the text and tablet images. Having a library version of the service available would really help with that.

If you're starting over, it would be nice to do the conversion interactively, and perhaps client-side for faster response and offline use. I started a little web-app when I was first studying Akkadian, but got lazy about typing in the sign list. Patching in the oracc sign list would make it a lot more useful!

Other possibly helpful resources:

Classical Language Toolkit has some support for Akkadian stemming and noun-declension.
There's a machine learning model for going the other way: signs to transliteration. Available in the akkadian python module.

wanderingstan commented 3 years ago

Nice to "meet" you @tpgillam and @rillian !

I submitted a PR (#3 ) with my proposed changes. I think it should not affect your legitimate worries

Replace unrecognized token with string instead of erroring out

Yes, it still throws at the low level, but just changes the UI. As in issue #2.

Also sounds v. handy, though depending on scope of change this possibly I'd not want to merge it into this specific repo, which is effectively the feeder for the slightly awkward-to-maintain site as mentioned above.

I strove to keep everything backwards compatible, so that things should still work on the "slightly awkward-to-maintain site" :).

One thing that I'm wondering is if you need a web interface at all? I had actually thought about refactoring out the main cuneification code into a library and putting it on pypi. I hadn't done so primarily because I thought there wouldn't be interest... but given your message maybe there is!

I don't think its too bad to have such a minimal web UI alongside the core cuneification code. As I see it, a descent scope for the project is:

At heart, a library for converting transliterations into cuneiform (with related functionality like symbol lists)
A minimal CLI tool for applying this library to files
A minimal Web tool to applying this library to web-entered text

That said, we could split off (1) into its own pypi project, and then have the CLI and Web UIs as separate—but seems a bit much for something this small. It would just make it more of a headache when one wanted to for example add a feature to the core library; it could mean landing PRs across 3 repos!

But in general, I'm happy to help. My only goal here was to enjoy the visceral feeling of seeing the actual visual symbols that were inscribed in clay all those many years ago!

rillian commented 3 years ago

That said, we could split off (1) into its own pypi project, and then have the CLI and Web UIs as separate—but seems a bit much for something this small. It would just make it more of a headache when one wanted to for example add a feature to the core library; it could mean landing PRs across 3 repos!

No, I would recommend implementing all three of those things within the current repository. But (1) can be published on pypi without including the other two, and without necessitating a separate repository. That would make it easier for other projects to incorporate the code without diverting too much effort from maintenance here.

tpgillam commented 3 years ago

Hi both - just a very quick reply for now, hopefully I'll get some more time at the weekend.

@wanderingstan definitely agree on not overengineering if there isn't demand. And thank you for the pull request; I'll take a look soon!

@rillian I think IF there is sufficient demand to make a pypi package then I'd strongly be in favour of putting it in a separate repo. And quite probably that repo shouldn't be owned by me, although I'll happily chip in to help set it up initially :-)

jonknowles commented 3 years ago

Hello to @tpgillam @rillian and @wanderingstan. I encountered this project a few days ago and found it really interesting, so I took a crack at porting it to JS and setting up a editing loop that can be run on keypress.

I set up repo over here (I didn't directly fork this one, since mine doesn't actually share code).

I can't vouch for the correctness of it (I have very little subject matter knowledge in this area), but my first attempt appears to be generating somewhat valid output. I basically just transformed the dictionary pickle file into a JSON object, and am using it to do lookups on my tokens.

I do think I am probably moving in the opposite direction as the rest of you, since I like web editors and REPLs (the babel.js one is what I modeled this after), and I thought something that gave rapid feedback would be helpful for students or people trying to compose something quickly. But since my code is downstream of this one (i.e. I tried to just consume the pickle file as my source of truth), it might be possible to collaborate on interfaces or tests, etc. If any of you are interested in that, let me know. I will definitely keep an eye out for @wanderingstan's PR, since it looks like there's some interesting stuff in there that I might make use of once it's finalized.

tpgillam commented 3 years ago

I'm just going to pull in @worhtinm, who is a subject expert, and for whom I wrote the original version some years ago :-) Perhaps he will have some thoughts on the improvements above, and the sorts of features that he would find useful!

@wanderingstan, unfortunately there's an issue with the server that is still unresolved (and outside of my control), so I'm reticent to merge your request until I'm actually able to deploy and test it in-place easily...

worhtinm commented 3 years ago

Dear all,

Hello, and thank you to Tom for bringing me in. I am no programmer, and the above is sadly mostly Double-Dutch to me.

From a user point of view, the advantage of the web interface is its foolproofness: you don't need to install anything, you can even just do a screen grab of the output.

@rillian and @wanderingstan, happy to comment on anything further, but please explain it to me as you would to a very slow three-year old :)

rillian commented 3 years ago

Hi @worhtinm. No disagreement on the web interface, it's very valuable as it is. The library idea is just about making it easier for other developers to support cuneiform text.

rillian commented 3 years ago

unfortunately there's an issue with the server that is still unresolved (and outside of my control), so I'm reticent to merge your request until I'm actually able to deploy and test it in-place easily...

If you want to encourage contributions, one way to handle this is to maintain a separate git branch for things which have been tested for deployment to the server. More experimental work can happen on a parallel branch. That way your repository can remain a focal point for new work where it can be reviewed by experts in the various components.

worhtinm commented 3 years ago

@rillian Many thanks for your interest. Given server-admin complexities at our end, it may be difficult to incorporate third-party suggestions. But if anyone wants to create a replica of the site on a server of their own, and then modify it, they are welcome to do so!

tpgillam / cuneifyplus

ATF files, bad token forgiveness #1