wet-boew / wet-boew-drupal

Drupal variant of the Web Experience Toolkit (WET)
137 stars 74 forks source link

Wiki documentation, Integration of Drupal with Multitrans #804

Closed joejoseph00 closed 11 years ago

joejoseph00 commented 11 years ago

@StephenOTT , I've been working on the wiki translation, we're currently in the process of implementing MultiTrans by MultiCorpora , I'll try to use this , but takes a while to build up a contextual text base. If I had some technical documents perfectly translated I could use this to build a text base that would help in translating this documentation. Our technical documentation is mostly english only, but some bigger guys like Stats might have a decent amount of documentation done in both languages that we could use for a text base. Meanwhile I have contacted MultiCorpora to see if they offer a sample text base for technical development type documentation, might speed things up if they provide us with a Multitrans compatible text base.

Anyone else using MultiTrans that would like to share with us a Text Base that would be suitable for the wiki documents? (Technical documentation for developers, administrators, power users). Seeing as there's some government heavyweights around here, I thought I should ask some of the bigger fish around here to see if they've got a good text base that they should share with us. I can use this to help finish off the wiki documentation. Perhaps TBS and the WxT guys might have something.

Also, seeing as the government is big into MultiTrans, it might be helpful if we could eventually integrate Multitrans with Drupal. This could be another huge selling point and we could upstage InterWoven in this area.

joejoseph00 commented 11 years ago

Although multitrans is closed source, it is a Canadian company based in the NCR, so we could even go talk to some developers there and see how to hook this into Drupal, or maybe someone like Stats can might want to sponsor this.

sylus commented 11 years ago

This is ultimately where Stats would like to go leveraging the awesome work done here: http://drupal.org/project/tmgmt

joejoseph00 commented 11 years ago

Multicorpora has a web services interface and documentation on integrating a CMS to it's system: http://www.multicorpora.com/en/products/options-and-add-ons/cms-connector/

joejoseph00 commented 11 years ago

I checked that tmgmt project, awesome, so nice that someone already has this idea. Get out of the gate running.

joejoseph00 commented 11 years ago

I'm going to get into contact with Transport Canada, apparently they're using MultiTrans. We are just in the process of setting it up. We exported 65000 nodes from drupal into html en and fr files so they can be imported into MultiTrans. PHP with mysqli is able to query Drupal (mysql), load 65000 nodes generate 65000 files (270 megabytes of data ) in about 13 seconds! The script is about a page long, not even.

joejoseph00 commented 11 years ago

Ideally we will want to leverage the content inside of the CMS as a text base for artificial intelligence based translation similar to the way MultiTrans uses text bases of high quality translation by taxonomy , so each content type would be a text base referenced upon by the AI logic in the translation engine. If it was conceived correctly this would eliminate the need to export data from Drupal into a system like Multitrans, instead leverage the engine of something like Multitrans (or even multitrans it'self) using the content types and taxonomy inside the Drupal DB (CMS db) as a way to prioritize a text base for high percentage, high scoring, quality translation. The more content , the better the engine works. (provided that the content is of high quality)

joejoseph00 commented 11 years ago

Looks like the tmgmt project is really on the right track:

All Drupal text elements can be used as source for translation:

Nodes
Entities
I18n Strings (Menu, Terms etc.)

A plugin architecture allows for the introduction of additional text sources (internal and external). Details see Sources Architecture

joejoseph00 commented 11 years ago

MultiTrans Prism is the ideal technological counterpart for your content management system (CMS). Adding Prism Flow’s translation workflow in combination with MultiCorpora’s Advanced Leveraging Translation Memory (ALTM) converts your CMS into a highly optimized global management system (GMS) that will save you time and money by recycling and routing previously translated content through an automated, fully configurable translation process.

MultiTrans Prism automation will greatly enhance the functional benefits of your CMS; plus, interfacing through Web Services is based on a universal, documented standard, therefore it is quick and inexpensive to implement.

multitrans

A translation can be triggered in one of two ways: CMS Push (NEW!) When your CMS contains a document that reaches a state of readiness for translation, an event can be triggered by the CMS that calls the MultiTrans Prism Web Services which, in turn, immediately start the automation. In this way, the Web Services respond to your CMS, acting as a virtual project manager that inserts the document into an automated workflow. The workflow is configurable, possibly containing tasks such as creation of decision-support analyses and a pretranslated document package, assignment of human resources to execute any hands-on translation activities and, finally, pushing the finished multilingual content back to the CMS while also updating the translation memory and terminology databases. MultiTrans Prism Polling An application polls the CMS at regular intervals. If it finds a document that is ready for translation, it pulls that document from the repository and triggers execution of linguistic functions such as document analysis and pretranslation, subsequently making the results available for further human interaction. After the translation is finished, relevant TextBases and TermBases may be automatically updated. Current integrations with Documentum and eDocs function in this manner.

How it works MultiTrans Prism Analysis The Analysis integration automatically compares new content to existing TextBases and TermBases, identifying full, fuzzy and sub-segment matches. This API also counts exact and fuzzy internal repetitions within a document or set of documents, and extracts the most repetitive terminology to allow for standardization before translation.

MultiTrans Prism Pretranslation The Pretranslation integration enables the automatic identification or replacement of exact or fuzzy segments, as well as sub-segments and terminology from existing TextBases and TermBases. Because the pretranslation process draws upon content from the CMS, it ensures that your documents’ tagged format structures, including DTP, will be optimally maintained. If your CMS supports it, your translation in one or several target languages can also be automatically checked back into the CMS.
If desired, the automated pretranslation can also be performed with various machine translation engines for pretranslation of unmatched segments, reducing human interaction to post-editing only. MultiTrans Prism TextBase Building and Updating

The TextBase Builder integration automatically maintains your TMs with the latest reviewed translations. Multilingual file formats such as XLIFF, TMX and Translation RTF can also be added to the TextBases, increasing the alignment accuracy even further. This automation will ensure that your ever-growing repository of multilingual assets remains up-to-date and ready for reuse. Linguistic benefits CMS Integration Convert your CMS to a Global Management System with MultiTrans Prism Web Services Why integrate an automated translation solution to your CMS? MultiTrans Prism includes all the components needed for seamless, automated TMS integration with your CMS as well as with your external project participants. There are modules for project management, translation memory, terminology management available either as client software or over the Web. Interfacing with customers and external service providers is intuitive, and processes can be easily automated. MultiTrans Prism supports industry-standard file formats for problem-free import and export of translation memory and terminology databases. Best of all, the entire solution is available from one provider; product development, maintenance, training, and technical support are unified under one roof so that you have one proven partner to ensure your successful deployment. MultiTrans Prism TextBases, powered by Advanced Leveraging TM technology, can reduce your translation expenses up to 50% by reusing content from your past translations. It helps ensure that your organization’s proprietary and critical terminology remains consistent throughout all translations. By providing translators with the full context of recurrences from past translations, it helps translators minimize misinterpretation and therefore reduces your organization’s legal and corporate identity risk.

Virtual Project Management Documents containing content that requires translation are automatically submitted by the CMS into the translation process through the background equivalent of an online Web portal that customers might use to interact with a translation agency. Inclusion of various project activities such as analysis, pretranslation, translation package creation, as well as other optional workflow integration may be automatically triggered, depending on system configuration. After translation and approval, content may be automatically pushed back into the CMS, and meanwhile the TextBases will be updated and prepared for future recycling of the newly translated content. These processes run 24/7. There is no human interaction, except when and where desired!

joejoseph00 commented 11 years ago

Please forgive the marketting language, I posted this for FYI purposes and to give ideas. This comes from a MultiCorpora document called CMS-Integration_LowRes_EN.pdf .

StephenOTT commented 11 years ago

I like how they use a phone jack to plug into the globe :+1: lol

joejoseph00 commented 11 years ago

lol, someone in marketting must have just came out of a cryogenic deep freeze from 1994 when dial up reached it's pinnacle. I've got a lot of RJ11 cables, they might be one day useful if the telephone companies dumped analog.

sylus commented 11 years ago

Closing this as don't see any actionable items.