translate / amagama

Web service for implementing a large-scale translation memory
http://amagama.translatehouse.org
GNU General Public License v3.0
90 stars 28 forks source link

Amagama cleanups #12

Closed unho closed 10 years ago

dwaynebailey commented 10 years ago

There are some things that need fixing before this can land, see comments

dwaynebailey commented 10 years ago

gtm

julen commented 10 years ago

So instead of giving a chance to improve on the web UI, let's remove it altogether?

unho commented 10 years ago

We are just replacing it. Replacement in progress.

dwaynebailey commented 10 years ago

@unho might be nice to actually explain what you are replacing it with, and why, to those kind enough to review your changes.

unho commented 10 years ago

@dwaynebailey Sure. Sorry.

@julen We are looking to replace the old web UI, which was simple, with a much more complex UI.

As a first step we removed the old web UI so amaGama just serves a JSON API, and static HTML (with CSS and JS) that queries that API and displays the results. This way we can keep the amaGama server simpler and the separation between API and web UI is much clearer (Python vs JavaScript). As a side effect this might reduce the load on the server, reduce the bandwidth usage (static resources caching on browser, less bandwidth usage). Also it might ease tweaking the results page to allow more complex handling of the results.

The second step is to allow more granularity on searches (files, projects, dates, regex searches) so we will need to adjust the API and the client to be able to deal with them. This requires changes in the database as well.

The third step is to expand the amaGama API to allow other queries, like words queries and stemming, so we can provide a replacement to the now gone Open-Tran.

It is still not clear if the second and third steps will be like those, but the first one is definitely as explained. Some notes on the ideas are in https://docs.google.com/document/d/1M13a-7AI2PoNf9GIeU3c_f__0w3ngwsIPxCaSnpr3Dk

julen commented 10 years ago

Thanks for sharing. Note that someone in the mailing list might be interested in hearing and maybe providing feedback as well.

You might also be interested in having a look at http://recursos.softcatala.org/.

Re. moving the UI to the client, I see the reasoning behind it, but just let me add that search result URLs should be linkable, please!

unho commented 10 years ago

@julen Thanks for your comments. And also thanks for the link, I was not aware of that one. I am having in mind other similar resources as well as Proxecto Trasno's ideas on combining TM with terminology.

Saved your idea about linking results. I am not sure how to implement this with the new approach for searches.

About the mailing list discussion we might need think about it because @friedelwolff is complaining a lot about it, with very good reasons I must say.

iafan commented 10 years ago

Julen pointed me to this thread. Accidentally, today we updated our local TM server (which uses Amagama-like JSON interface and is plugged into our own translation server), and it is now based on Elasticsearch.

Previously we would do this the hard way: we populated the TM database using our own scripts, calculating the similarity using the Levenshtein distance algorithm. Populating the DB initially with ~1.5M strings would take about 12 hours.

We decided to experiment with Elasticsearch, which has a very clean RESTful API, and it appeared that in a matter of one day we had a completely new production-ready system that populates the index right from the Pootle database, and the backend script that queries ES and pushes out the results in Amagama-compatible format. The results are impressive (to say the least): the full translation database is sucked into ES in ~3 minutes, and the content becomes instantly searchable, and returned are the JSON documents already ranked by their similarity.

So I would strongly advise trying out Elasticsearch both for Amagama and local TM (though I know that you have the local TM already implemented some other way). Setting up ES also takes 5 minutes or so.

Our next plans are to make our TM server a part of Pootle itself so that we can update the index right when the unit is saved (currently we simply pull new translations from Pootle every 15 seconds).

unho commented 10 years ago

The drop web UI was reverted in 1d65e898 to gradually replace it with the new web UI as requested.

dwaynebailey commented 10 years ago

@iafan thanks for the headsup we've now started looking at ES as an alternative option for both

iafan commented 10 years ago

@dwaynebailey if you need our code for inspiration (which we will integrate into an open-source branch anyway), feel free to ask.

unho commented 10 years ago

@iafan An open-source branch for which repository? We will probably thank you if you share your code.

iafan commented 10 years ago

I mean this code, once rewritten in Python, will be a part of https://github.com/evernote/pootle/

I'll zip the files and send them to you via email.

dwaynebailey commented 10 years ago

@iafan perfect, much appreciated.