younginnovations / iatipublisher

IATI Publishing Tool
GNU Affero General Public License v3.0
7 stars 0 forks source link

Mark up text for automated string extraction #1477

Open emmajclegg opened 3 months ago

emmajclegg commented 3 months ago

This is a first step to translating IATI Publisher's interface into French and Spanish, following the approach discussed here: #1420

YI will prepare text for automated extraction from IATI Publisher, ODS will review and get it translated, then YI will reintegrate text back into IATI Publisher.

Tasks

robredpath commented 2 months ago

Following our conversation this week, I wanted to share an outline of the process that you'll need to follow.

It's important that this is an automated process that is part of your standard workflow, so that any changes to text are detected and translated quickly in the future.

In my experience, this is usually achieved by marking up the text in some way. See, for example, this Django template from one of our projects, which wraps each sentence in {% blocktrans %} tags which signal to Django's i18n module that the string should be included in the translation process. Our workflow uses .pot/.po files.

I can see that there's already some files with lists of strings that look like a translation mechanism, which may be how this is already starting to be implemented? My instinct is that this is potentially quite a fragile and high-effort way to work, but it's up to you!

Either way, once the system is in place, then each time we make an update, the process is:

Apart from a few manual steps to authorise the translation and review the software output to make sure that nothing went wrong, this is an entirely manual process.

I don't think that IATI Publisher necessarily has to have a .pot/po file - based process, but if you were to build one then it would be very close to what we're going to need once we work out the file details with the translation company.

Does that make sense? I'm very happy to provide any more detail if it's useful.

emmajclegg commented 2 months ago

Thanks @robredpath - I assume you meant "Apart from a few manual steps to authorise the translation..., this is an entirely automated process" ? Otherwise, no questions from me

Sanilblank commented 2 months ago

Hello @robredpath Based on the template you have provided, it seems that you are trying to mention the localization mechanism of Django framework. For the translation process in IATI Publisher, when we initially started the process, we used a similar approach. Laravel, the framework used for the backend development, also provides a similar approach for localization and if the templating engine of laravel had been used for the frontend part as well, the template would look very much similar to what you have given as a template. However, since we are using Vue js, a little bit more complexity is added for displaying the text in the frontend side even though a similar templating structure is used by Vue as well. Similar to how you have mentioned the use of .pot/.po files, we (in laravel) also use similar files that contain array of data for saving the translations in different files as required. The process and mechanism for both is quite similar, so I don't think we will need to go through the process of incorporating .pot/.po files directly into the system as it will require a lot more research on how it can be achieved. The translations will be stored in files within the system but since providing those files directly to you for translation could cause confusion about how to process them as they fall more into the technical category, we were thinking of writing a script that would take all the strings and place them in an excel file which would be provided to you and then you would add the translated strings and provide the file back to us, and finally a script to take those translations and put them in the format required by the system. If you will required .pot/.po files for us to send the strings to be translated, we will need a bit of time to research how the files can be created, and then the rest of the process will be similar i.e. a script will generate the file which will be sent to you, you will add the translations and send the file back to us, and a script will take the translations and put them into the system. We are a bit confused on the use of the word 'automated' in your title and description. By automate I am assuming you mean that a file will contain all the english strings present in the system and if any changes occurs in the file, a script will detect the change and then generate the required excel or .pot/.po file which will be sent to you for translation. If my understanding correct? If it is not, could you provide a bit more explanation for this part. Hope I have made things clear here, if there is something that seems a bit confusing, I would be glad to go a bit deeper in the explanation.

cc. @praweshsth @PG-Momik

emmajclegg commented 2 months ago

Thanks for the information here @Sanilblank . @robredpath is away until Aug 29th unfortunately, but I will see if anyone else in our team can help with the file format question in the meantime.

We are a bit confused on the use of the word 'automated' in your title and description. By automate I am assuming you mean that a file will contain all the english strings present in the system and if any changes occurs in the file, a script will detect the change and then generate the required excel or .pot/.po file which will be sent to you for translation. If my understanding correct?

Yes, that's correct to my understanding. We remain in control of how often and when we run the re-translation, but your system should be capable of detecting what English text has and hasn't changed since the last translation.

By the extraction and re-integration of text into IATI Publisher being done in an automated way, we mean via a script as opposed to any manual copy and pasting.

Sanilblank commented 2 months ago

@emmajclegg Thanks for the clarification. I have another question regarding the extraction part. The system will generate the excel or another format file containing the english strings to be extracted which will be sent yo you. We will need a way to inform you about which strings have been added/changed since the last translation was done. So, we could do this in many ways, the first being generating only the strings which have been either added or updated (which will require translation) in the file, another way could be generating all the string in the file along with previous translations as well which will allow you to update the translations even for the ones which were already done previously. The second approach would give you more flexibility but it may be more difficult for you to see which texts actually require translating. If you have any other ideas regarding this subject, we are open to hear them. Please have a look and confirm the process which we should move forward with.

cc. @praweshsth @PG-Momik

emmajclegg commented 2 months ago

Hi @Sanilblank - I don't want to give wrong information on this so will check with @robredpath once he's back (Aug 29th) and update here.

robredpath commented 1 month ago

Hi @Sanilblank! Thanks for this - it's really useful to understand what you're thinking.

The exact format of the files doesn't really matter too much for us - I suggested .pot/.po files as they're fairly standard in our other applications and are straightforward to work with, but an .xlsx file would also be fine. The main thing is that the process is automated and repeatable.

By "automated" what I mean is that we expect the list of strings to be generated directly from the source code by software, without any manual steps - and for the translated strings to similarly be re-integrated automatically. This means that the process is easily repeatable, so that a small update can be made easily and large updates aren't too much of a problem.

We don't expect the automation to require zero human contact, but we want to make sure that everything gets translated as part of the regular updating process for the software: every time a form or button changes, or we add some new explanatory text, it should be translated promptly.

By way of example, for our documentation platform we run one command to generate the .pot files that we send to the translators, and then we check in the translated files to git and re-run the build process to generate the multi-lingual website. This gives us a very high level of repeatability and consistency, and it's easy for us to do which encourages us to do it often - even for very small changes.

In our documentation work we send the whole documentation site each time, and the translation platform figures out what's changed, and gets that translated. We then re-import the whole translated file back in. Our experience is that it's easier that way, rather than trying to manage lists of things that have changed. Ultimately, it is up to you, but that's our experience and recommendation.

Hope that helps - do let me know if you have any further questions

Sanilblank commented 1 month ago

Hi @robredpath I think I understand what you are trying to say and I feel that we are on the same page regarding the automation process. As mentioned previously, we will be writing a script which will be responsible for checking the translations maintained in the system and will be able to generate an excel file consisting of all texts either translated or requiring translation which will be sent to you. You will perform the translations as required and will send the file back to us and we will use a script to simply take the translations and insert them into the system.

cc. @praweshsth @PG-Momik

Sanilblank commented 1 month ago

Hi @robredpath The above parts discussed between us are very clear now. For the part where we send the data present in the backend to the frontend, we researched online and found two methods.

  1. The entire data is loaded in the app.blade.php file and saved as global data. Then, the FE uses the data for showing the texts throughout the system. This increase the load in each page, so we could put this in cache and then use the data everywhere. Still, the data stored in cache would be very high and so this process may not be feasible. Also, when the language is changed by the user, the cache data is delete.
  2. The BE will have apis for sending the translated texts to the FE. When a page loads, the apis for the required translated texts will be called. When an api for a certain set of translations is called, they will be store on the backend cache in redis so that next time no processing part is required. When a user changes the language, the cache data will be deleted. This process will not send all the translated texts data to the FE immediately which will help to reduce the load. We are leaning towards using this method.

This message is just to update you regarding the findings we have had and to give you an update about how we are proceeding for this feature.

cc. @praweshsth @PG-Momik