Open mquinson opened 6 years ago
I agree adding specific solution with a narrow scope may complicate code without much benefit. That's not good thing. But I think we can do better by adding a generic feature cleanly by not including such translation exclusion code within po4a. (There is such needs See https://bugs.debian.org/607726 . As written there, -o option may have some answer for XML but it wasn't easy for me to implement.)
What po4a should offer is independent ways to specify 2 variants of original English document. One to make POT file and another to make translated text with the help of PO file in po4a.cfg.
Both of these should be generated by the external program.
This approach allow us to include many unstranslatable contents in many parts. This is how I manage to include many auto-generated statistical data included in Debian Reference with manual convoluted Makefile. If po4a support this kind of feature, I can clean up my Makefile :-)
For XML source, we can write XSLT filter to exclude specific tag contents such as
For non-XML source, we can deploy CPP predecessor directive to enable similar things by pre-processing.
This approach should be non-invasive and clean, I think...
This is follow up to my post yesterday.
As for implementing 2 English base input files, specifying this in current po4a/po4a.cfg
syntax isn't trivial and very much confusing.
I think most reasonable approach is to create optional entries in po4a/po4a.cfg
to set up custom prefilter programs:
[pot_prefilter]
: optional entry to set up prefilter for input source test -> source text fed into "po4a-gettextize -m
" option input file (POT generation base file)[translation_prefilter]
: optional entry to set up prefilter for input source test -> source text fed into "po4a-translate -m
"option input file (Translation file generation base file)This approach should be compatible with existing syntax while adding very generic flexibility to po4a infrastructure.
Hmmm.. maybe adding option to po4a
command for these prefilters may be even better.
I like this idea of pre-filtering the input document before extracting the POT file. I think that this is a very appealing approach to solve this problem. Any help (or even better, patch) going in that direction would be really appreciated.
Thanks for the insight.
Hello there. Actually, there is a preliminary implementation already in po4a :)
If you specify the pot_in for a given document, this is the file used to build the POT and PO files. We have an example in t-02-addendums/book-potin.conf
(that I plan to rewrite as I do for all tests currently):
[po4a_langs] ja
[po4a_paths] tmp/book.pot ja:t-02-addendums/book.po.ja
[type:docbook] t-02-addendums/book-auto.xml \
pot_in:t-02-addendums/book.xml \
ja:tmp/book-auto.ja.xml \
add_ja:t-02-addendums/book.addendum1 \
opt:"-k 0 -o nodefault=\"<bookinfo> <author>\" \
-o break=\"<bookinfo> <author>\" \
-o untranslated=\"<bookinfo>\" \
-o translated=\"<author>\""
We have:
--- t-02-addendums/book-auto.xml 2020-04-09 00:23:24.801047067 +0200
+++ t-02-addendums/book.xml 2020-04-09 00:23:24.801047067 +0200
@@ -59,11 +59,6 @@
</totalfake>
</bogustag>
</chapter>
-<chapter><title>Title: Auto add text</title>
-<para>
-This is to emulate auto added non-translated content.
-</para>
-</chapter>
<appendix><title>Title: Optional Appendix</title>
<para>
Appendixes are optional.
As a result, these strings are not added to the pot, so their translation is not found in the po, so they remain unchanged. So it ... works.
But this is very cumbersome, because one has to implement the filtering externally, which kinda goes against the whole spirit of the po4a binary as opposed to the po4a-* tools.
I'd prefer to have a filter, as @osamuaoki proposed. I still need to think of how to express such a filter in the config file.
Hi
On Sat, Apr 18, 2020 at 02:10:12PM -0700, Martin Quinson wrote:
Hello there. Actually, there is a preliminary implementation already in po4a :)
If you specify the pot_in for a given document, this is the file used to build the POT and PO files. We have an example in t-02-addendums/book-potin.conf (that I plan to rewrite as I do for all tests currently): ... But this is very cumbersome, because one has to implement the filtering externally, which kinda goes against the whole spirit of the po4a binary as ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ opposed to the po4a-* tools. ^^^^^^^^^^^^^^^^^^^^^^^^^^^ I see.
I'd prefer to have a filter, as @osamuaoki proposed. I still need to think of how to express such a filter in the config file.
It took me a while for me to understand what exactly you are talking. I hope I understood it correctly.
I proposed pot_in as a feature enhancement to po4a to allow an equivalent process with po4a alone as documented in POD as below.
Special case with specifying B |
|
---|---|
<- source files -> | <--------- build results -----------------> |
master document --+--------------------------+ | |
: | |
external : filtered | |
filtering ========X..> master | |
program document | |
V +--> translations | |
old PO files ----------+--> updated PO files + | |
^ | |
V | |
+<..........................+ | |
(the updated PO files are manually | |
copied to the source of the next | |
release while manually updating | |
the translation contents) |
This was a meant to be the simplest use case demonstration example of "pot_in".
FTI: Currently, I use po4a-* tools embedded in a Makefile.
Let's consider cases.
Case 1. debian-reference
I haven't migrated to the new po4a yet ;-) But, my Makefile for
debian-reference does as follows using po4a-*:
|| <---------------------------------------- source files ->|<--------- build results ---------------------------->
||
|| non-XML +-----> master document ------------------+--> English XML --+--> HTML
|| master document template -+ external --+ (master) XML | |
|| +--> merging | +--> PDF
|| supplimental data --------+ program --+ |
|| non-XML ^ to generate +----> filtered master document |
|| | XML files (pot_in) | XML V
|| | |(pot) +--> translations -+--> HTML
|| | V ^ |
|| generation script old PO files ----------+--> updated PO files + XML +--> PDF
|| wget/sed/... ^ (po) (po) |
|| | V
|| +<..........................+
|| (the updated PO files are manually
|| copied to the source of the next
|| release while manually updating
|| the translation contents)
For this case, both (master) and (pot_in) should be generated at the
same time by the external merging program even after migrating to po4a.
With (pot_in) feature, I can migrate to po4a.
Case 2. XML attribute support with XSLT
As discussed in: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=607726#21
The basic idea is to use tags with attribute in the original XML file. ... like:
Hello @osamuaoki, thanks for the detailed answer.
I must however confess that I'm a bit lost here. You speak of the pot_in
feature as something that would be desirable, but it's already implemented, right? I just pushed some tests to ensure that it will continue to work in the future.
So, maybe you mean that this bug can be closed because the filtering
thing that I was suggesting is less useful? If so, I agree. I changed my mind in the meanwhile, and I think that it is much easier to keep the filtering out of the po4a program, that is already rather complex. I don't think that we can find a solution that fits all needs to specify the filtering command line in the po4a.conf, so I take it back: pot_in
is sufficient from my point of view, and we could close this issue.
What would be needed from your point of view to close this?
Thanks for your help, Mt.
Hello @osamuaoki, could you please help me understanding what remains to be done before closing this issue ?
Thanks in advance,
I have some paragraphs in a text file (markdown) that i don't want to have translated since they mostly contain code. Ideally i would have a pot file with some paragraphs marked as "not to translate" that would be ignored during conversions, so to keep them in english in the translated file.
I've been looking for ways to achieve that, but it's hard to find a solution. I now found this issue but it's still not clear to me if it's now possible to mark some paragraphs not-for-translations. Is there currently a native way to achieve this? Is there a workaround i'm missing?
Hello @erciccione, sorry for the delay.
Did you see https://po4a.org/man/man1/po4a.1.php#lbAN in the documentation?
If you've read the doc and it's not sufficient, could you please elaborate on your question? The idea is to produce a filtered file where the content you want to hide is removed. This filtered file should be used as pot_in
.
Maybe your question is about how to produce that filtered file removing the content you want to hide? Well, this is not in the field of po4a: you have to filter it on your side, to produce the file that will be used as pot_in
in po4a.
I'm not quite sure of how I'd do this for text files. In markdown, I'd use specific markers in comments to indicate the beginning and end of such area to hide, and then I'd come up with a small crude Perl script do do the actual filtering.
I don't think prefilter is a correct solution. If I understand correctly, po4a would not see the content that is tagged as no-translated when generating the pot files because, it would simply be eliminated from original content before. But, when po4a would blend the translations, the eliminated parts would need to be present and they would be counted as not translated, thus defeating the translations statistics of the file and the threshold logic.
Well, that's the currently implemented solution :) What would you propose as a replacement?
Just to be sure we are on the same pace here, @jnavila: Filtering is already implemented and integrated to po4a since several years already. If you want to update it to make it easier for the users, be my guest, but that's already working. There is even some tests.
One thing we could do is to improve Po.pm so that it does not could missing entries as untranslated. That should be rather easy to implement, but it could have bad side effects for people using the po4a-* subscripts in the wrong order. That's a drawback with which I could live, probably.
OK. Thank you for clearing up what's done and what could be enhanced. I cannot commit on changes right now.
As far as functional features are concerned, I think this is done deal. Now line matching rules can be created more intuitively, too for addendum.
As for easy usage for end-users for filtering, we may need XML filtering documentation to use attribute with example XSLT+Makefile since they are nontrivial for most people.
So let's rename this issue 77.
Hello,
reading again the logs of this issue, I come to the conclusion that the feature may be implemented and documented, it is still very cumbersome to use. I like very much the idea of @erciccione, of suppression POT files that would be a POT file which msgids get automatically marked as "not to translate". I think that is would be much easier to manage for the users, as you just have to check on your (usual) POT to seach for the entries that shouldn't be here, and copy/paste them unchanged to your suppression file to have them automatically removed. We could even probably warn about unused entries in the suppression file to ease the maintenance of this file (probably, because I'm not sure about split settings which could get in the way).
Internally, that shouldn't be too complex to implement, a bit like the po4a-gettextize internal behavior: after building the pot file from the master documents, just before writing it to disk, you load the suppression file in a new PO object, and then iterate over the entries of that PO object to remove those msgids from the master POT files.
Unfortunately, I'm not sure I'll have to implement this before releasing the long overdue v0.70, so I'm writing this to (1) confirm with you guys that this new feature would be the right answer to your need (2) remember about it the next time that I find some time for po4a.
Since my target is XML, filtering by XML-tag is easy. I basically use po4a in 2 stage. Once on filtered XML to create template for PO file. Second time with original XML to produce final result. But for markdown, this strategy doesn't work.
I agree creating blocking-pot file is a reasonable idea to address this needs via data-source neutral way.
msguniq-like filtering is all you need to implement .
Initially reported on Alioth by Michael Terry (03/05/2009):
Comment by Denis Barbier (29/07/2010):
Comment by Michael Terry (29/07/2010):
Comment by me (30/03/2017):
Comment by me (30/03/2017):