google / mdbook-i18n-helpers

Translation support for mdbook. The plugins here give you a structured way to maintain a translated book.
Apache License 2.0
127 stars 25 forks source link

Split POT file depending on specified depth #77

Closed antoniolinhart closed 6 months ago

antoniolinhart commented 1 year ago

Hey all! This is a PR in relation to #67 to split the pot file depending on the specified depth. Another PR will be submitted to change gettext to merge a directory of pot files.

antoniolinhart commented 1 year ago

@mgeisler What do you think about the implementation of splitting the template file? In this version I chose to add messages that did not have a matching source to a catalog that was saved in the file name specified in the input params.

I'm open to changing how we use that pot-file input param, I thought that this made the most sense for now. Would you want to add another optional field for output directory if we have a depth > 0?

mgeisler commented 1 year ago
  • Mimics the directory structure for the book

I think this is a key decision. I was thinking we would use the Book (from RenderContext) as the driver of this — not the file names.

We currently iterate over the BookItems in depth-first order (via Book::iter), but we could instead iterate over the sections field. I think that would let you avoid having to think of where SUMMARY.md fits into the picture.

As for what to name the POT files, I guess this can be derived from Chapter::path? Perhaps we need a top-level output directory (po/) and then we create po/foo.pot for a Chapter { path:foo.md} chapter. We would put all sub_items into the same po/foo.pot file if we're splitting at depth = 1. If we use depth = 2, then we would create po/foo.pot with only the data from this chapter and create po/foo/bar.pot with everything from the sub_items.

There is no guarantee that nested chapters are stored in nested files, but I think we should be safe if we follow the logical structure of the Book in this manner.

antoniolinhart commented 12 months ago
  • Mimics the directory structure for the book

I think this is a key decision. I was thinking we would use the Book (from RenderContext) as the driver of this — not the file names.

We currently iterate over the BookItems in depth-first order (via Book::iter), but we could instead iterate over the sections field. I think that would let you avoid having to think of where SUMMARY.md fits into the picture.

Oh! That's a good point for using the sections directly instead of that built in iterator. Until looking into the Book items struct it was hard to know exactly why that worked the way it did.

If I follow through all of the Chapters into their sub items and take the content from each Chapter and so on, that should include all the messages, right? I'm running into an interesting situation where there's some messages that aren't there that should be. Comparing between a current messages.pot and a messages.pot from my program with depth 0, it's missing about 10 messages that all appear to be the Rust scripts that get included as sample code. There are many scripts included in the messages that are sample code, there just happen to be some that don't come through?

As for what to name the POT files, I guess this can be derived from Chapter::path? Perhaps we need a top-level output directory (po/) and then we create po/foo.pot for a Chapter { path:foo.md} chapter. We would put all sub_items into the same po/foo.pot file if we're splitting at depth = 1. If we use depth = 2, then we would create po/foo.pot with only the data from this chapter and create po/foo/bar.pot with everything from the sub_items.

There is no guarantee that nested chapters are stored in nested files, but I think we should be safe if we follow the logical structure of the Book in this manner.

This makes more sense to follow the book instead. I think it will lead to the split depth input being a lot more logical too. Thank you!

Still working through the revised implementation!

mgeisler commented 12 months ago

There are many scripts included in the messages that are sample code, there just happen to be some that don't come through?

Much of the Rust code is included via the {{#include foo.rs}} system. That syntax is a mdbook extension. This is done by the built-in links preprocessor.

Could the scripts be related to this?

Oh... another idea: are you perhaps seeing the effects of #75 by @dyoo? It was merged just a few days ago :slightly_smiling_face:

mgeisler commented 12 months ago

There is no guarantee that nested chapters are stored in nested files, but I think we should be safe if we follow the logical structure of the Book in this manner.

This makes more sense to follow the book instead. I think it will lead to the split depth input being a lot more logical too. Thank you!

Still working through the revised implementation!

Sounds great, thank you so much for your help here!

mgeisler commented 12 months ago

Hi @antoniolinhart, just a small thing: the format check fails. You should configure your editor to call rustfmt automatically (or run dprint fmt on the code).

antoniolinhart commented 12 months ago

There are many scripts included in the messages that are sample code, there just happen to be some that don't come through?

Much of the Rust code is included via the {{#include foo.rs}} system. That syntax is a [mdbook extension]https://rust-lang.github.io/mdBook/format/mdbook.html#including-files). This is done by the built-in links preprocessor.

Could the scripts be related to this?

Oh... another idea: are you perhaps seeing the effects of #75 by @dyoo? It was merged just a few days ago 🙂

Looks like I need to pull main more often :laughing: This seems to be what my issue was. Thank you!

mgeisler commented 11 months ago

Hey @antoniolinhart, I skimmed through the updated code — from the test cases it looks like it works? Super cool!

You should rebase onto the latest main and resolve the merge conflict.

mgeisler commented 11 months ago

Hi @antoniolinhart, thanks for the cleanup. I tried checking out the branch and ran

% MDBOOK_OUTPUT='{"xgettext": {"pot-file": "messages.pot", "depth": 1}}' mdbook build -d po1 
2023-09-29 10:38:28 [INFO] (mdbook::book): Book building has started
2023-09-29 10:38:28 [INFO] (mdbook::book): Running the xgettext backend
2023-09-29 10:38:28 [INFO] (mdbook::renderer): Invoking the "xgettext" renderer

This gives me

% tree po1                                                                  (~/src/comprehensive-rust)
po1
└── src
    ├── android
    │   ├── aidl.pot
    │   ├── build-rules.pot
    │   ├── interoperability.pot
    │   ├── logging.pot
    │   └── setup.pot
    ├── android.pot

I had hoped there would be a single POT file for each chapter in the top-level. Like this:

% tree expected1                                                            (~/src/comprehensive-rust)
expected1
├── android.pot
├── async.pot
├── bare-metal.pot
├── basic-syntax.pot
├── cargo.pot

I suppose the problem in an off-by-one error somewhere :slightly_smiling_face:

antoniolinhart commented 11 months ago

Hi @antoniolinhart, thanks for the cleanup. I tried checking out the branch and ran

% MDBOOK_OUTPUT='{"xgettext": {"pot-file": "messages.pot", "depth": 1}}' mdbook build -d po1 
2023-09-29 10:38:28 [INFO] (mdbook::book): Book building has started
2023-09-29 10:38:28 [INFO] (mdbook::book): Running the xgettext backend
2023-09-29 10:38:28 [INFO] (mdbook::renderer): Invoking the "xgettext" renderer

This gives me

% tree po1                                                                  (~/src/comprehensive-rust)
po1
└── src
    ├── android
    │   ├── aidl.pot
    │   ├── build-rules.pot
    │   ├── interoperability.pot
    │   ├── logging.pot
    │   └── setup.pot
    ├── android.pot

I had hoped there would be a single POT file for each chapter in the top-level. Like this:

% tree expected1                                                            (~/src/comprehensive-rust)
expected1
├── android.pot
├── async.pot
├── bare-metal.pot
├── basic-syntax.pot
├── cargo.pot

I suppose the problem in an off-by-one error somewhere 🙂

The challenge with this is that Android isn't actually a top-level chapter, mdbook treats it as a part title and doesn't contain the sub-items as expected like "interoperability" "aidl", "build-rules". I think if we wanted something like this we'd have to leverage the listed directory in the code, or change the SUMMARY file to make it easier

mgeisler commented 11 months ago

The challenge with this is that Android isn't actually a top-level chapter, mdbook treats it as a part title and doesn't contain the sub-items as expected like "interoperability" "aidl", "build-rules". I think if we wanted something like this we'd have to leverage the listed directory in the code, or change the SUMMARY file to make it easier

Oh! I had not realized this until now...

We can change our own SUMMARY.md file, but we won't be able to the change the upstream format easily.

I think the code right now splits the POT file into ~63 files, whereas we only have ~13 parts. Do you think you can add code for tracking the parts? I would suggest transforming the part title into a slug using something like this:

fn slug(title: &str) -> String {
    title
        .split_whitespace()
        .map(|word| {
            word.to_lowercase()
                .chars()
                .filter(|&ch| ch == '-' || !ch.is_ascii_punctuation())
                .collect::<String>()
        })
        .filter(|word| !word.is_empty())
        .collect::<Vec<_>>()
        .join("-")
}

I tested it a bit on our own titles plus some headings from the Rust Book and they look reasonable.

antoniolinhart commented 11 months ago

Hi @antoniolinhart, thanks for the cleanup. I tried checking out the branch and ran

% MDBOOK_OUTPUT='{"xgettext": {"pot-file": "messages.pot", "depth": 1}}' mdbook build -d po1 
2023-09-29 10:38:28 [INFO] (mdbook::book): Book building has started
2023-09-29 10:38:28 [INFO] (mdbook::book): Running the xgettext backend
2023-09-29 10:38:28 [INFO] (mdbook::renderer): Invoking the "xgettext" renderer

This gives me

% tree po1                                                                  (~/src/comprehensive-rust)
po1
└── src
    ├── android
    │   ├── aidl.pot
    │   ├── build-rules.pot
    │   ├── interoperability.pot
    │   ├── logging.pot
    │   └── setup.pot
    ├── android.pot

I had hoped there would be a single POT file for each chapter in the top-level. Like this:

% tree expected1                                                            (~/src/comprehensive-rust)
expected1
├── android.pot
├── async.pot
├── bare-metal.pot
├── basic-syntax.pot
├── cargo.pot

I suppose the problem in an off-by-one error somewhere 🙂

After the most recent improvements I think it's closer to expectations - depth 1 splits each 'part' into a separate catalog.

Here's the output I get for the following depths (using the Android chapter as an example after depth = 1):

Depth 1 (note that summary.pot includes messages from any chapters prior to encountering a PartTitle):

$ MDBOOK_OUTPUT='{"xgettext": {"depth": 1}}'   mdbook build -d po/XX
2023-10-05 17:36:46 [INFO] (mdbook::book): Book building has started
2023-10-05 17:36:46 [INFO] (mdbook::book): Running the xgettext backend
2023-10-05 17:36:46 [INFO] (mdbook::renderer): Invoking the "xgettext" renderer
$ tree po/XX
po/XX
├── android.pot
├── bare-metal-afternoon.pot
├── bare-metal-morning.pot
├── concurrency-afternoon.pot
├── concurrency-morning.pot
├── day-1-afternoon.pot
├── day-1-morning.pot
├── day-2-afternoon.pot
├── day-2-morning.pot
├── day-3-afternoon.pot
├── day-3-morning.pot
├── final-words.pot
├── solutions.pot
└── summary.pot

Depth 2

$ MDBOOK_OUTPUT='{"xgettext": {"depth": 2}}'   mdbook build -d po/XX
2023-10-05 17:36:46 [INFO] (mdbook::book): Book building has started
2023-10-05 17:36:46 [INFO] (mdbook::book): Running the xgettext backend
2023-10-05 17:36:46 [INFO] (mdbook::renderer): Invoking the "xgettext" renderer
$ tree po/XX                                                                   
po/XX                                                                           
├── android                                                                     
│   ├── aidl.pot                                                                
│   ├── build-rules.pot                                                         
│   ├── exercises.pot                                                           
│   ├── interoperability.pot                                                    
│   ├── logging.pot                                                             
│   ├── setup.pot                                                               
│   └── welcome.pot

Depth 3

$ MDBOOK_OUTPUT='{"xgettext": {"depth": 3}}'   mdbook build -d po/XX
2023-10-05 17:36:46 [INFO] (mdbook::book): Book building has started
2023-10-05 17:36:46 [INFO] (mdbook::book): Running the xgettext backend
2023-10-05 17:36:46 [INFO] (mdbook::renderer): Invoking the "xgettext" renderer
$ tree po/XX                                                                   
po/XX                                                                           
├── android                                                                     
│   ├── aidl                                                                    
│   │   ├── changing-api.pot                                                    
│   │   ├── client.pot                                                          
│   │   ├── deploy.pot                                                          
│   │   ├── implementation.pot                                                  
│   │   ├── interface.pot                                                       
│   │   └── server.pot                                                          
│   ├── aidl.pot                                                                
│   ├── build-rules                                                             
│   │   ├── binary.pot                                                          
│   │   └── library.pot                                                         
│   ├── build-rules.pot                                                         
│   ├── exercises.pot                                                           
│   ├── interoperability                                                        
│   │   ├── with-c.pot                                                          
│   │   └── with-java.pot                                                       
│   ├── interoperability.pot                                                    
│   ├── logging.pot                                                             
│   ├── setup.pot                                                               
│   └── welcome.pot

Depth 4

$ MDBOOK_OUTPUT='{"xgettext": {"depth": 4}}'   mdbook build -d po/XX
2023-10-05 17:36:46 [INFO] (mdbook::book): Book building has started
2023-10-05 17:36:46 [INFO] (mdbook::book): Running the xgettext backend
2023-10-05 17:36:46 [INFO] (mdbook::renderer): Invoking the "xgettext" renderer
$ tree po/XX                                                                   
po/XX                                                                           
├── android                                                                     
│   ├── aidl                                                                    
│   │   ├── changing-api.pot                                                    
│   │   ├── client.pot                                                          
│   │   ├── deploy.pot                                                          
│   │   ├── implementation.pot                                                  
│   │   ├── interface.pot                                                       
│   │   └── server.pot                                                          
│   ├── aidl.pot                                                                
│   ├── build-rules                                                             
│   │   ├── binary.pot                                                          
│   │   └── library.pot                                                         
│   ├── build-rules.pot                                                         
│   ├── exercises.pot                                                           
│   ├── interoperability                                                        
│   │   ├── with-c                                                              
│   │   │   ├── calling-c-with-bindgen.pot                                      
│   │   │   └── calling-rust-from-c.pot                                         
│   │   ├── with-c.pot                                                          
│   │   └── with-java.pot                                                       
│   ├── interoperability.pot                                                    
│   ├── logging.pot                                                             
│   ├── setup.pot                                                               
│   └── welcome.pot
mgeisler commented 8 months ago

Hi @antoniolinhart! Happy new year :partying_face:

Would you be able to rebase this PR on top of the latest main? We should focus on getting it merged to prevent it from bit-rotting!

Even if the functionality to merge the split PO files are not yet there, the splitting functionality could be useful to someone (they would then have to write a tiny script to msgcat everything back together before using the translation).

antoniolinhart commented 8 months ago

Hi @antoniolinhart! Happy new year 🥳

Would you be able to rebase this PR on top of the latest main? We should focus on getting it merged to prevent it from bit-rotting!

Even if the functionality to merge the split PO files are not yet there, the splitting functionality could be useful to someone (they would then have to write a tiny script to msgcat everything back together before using the translation).

Happy new year! I got taken away with some high priority team projects so it's been a while since I've been able to look at this. That's a good point! My goal will be to (this week/next week) get the comments responded to / get it rebased on main since I know the longer I wait the harder it will be :sweat_smile:

Thanks for your patience on this!

mgeisler commented 8 months ago

Happy new year! I got taken away with some high priority team projects so it's been a while since I've been able to look at this.

That's completely okay! The functionality here is luckily not blocking anything right now.

My goal will be to (this week/next week) get the comments responded to / get it rebased on main since I know the longer I wait the harder it will be 😅

Thanks for sticking with it! I think the conflicts should be small at this point, so it should be possible to revive this PR without too much trouble.

antoniolinhart commented 7 months ago

Should be ready for another round of review @mgeisler ! Thank you for your extended patience and support!

mgeisler commented 7 months ago

Should be ready for another round of review @mgeisler ! Thank you for your extended patience and support!

I'm the one thanking! Let me look it through now.

mgeisler commented 6 months ago

Hey @antoniolinhart, I hope the changes in #107 doesn't cause too big conflicts here! I think we can merge this straight after it gets rebased.

antoniolinhart commented 6 months ago

@mgeisler

Should be good to go! It appears that the diff base is not against the correct source so it's showing that I changed a lot of files that I obtained from rebasing on the updated code. I tried looking around but it appears that this merge is into google:main which should have the updated files?

mgeisler commented 6 months ago

@mgeisler

Should be good to go!

Great!

It appears that the diff base is not against the correct source so it's showing that I changed a lot of files that I obtained from rebasing on the updated code. I tried looking around but it appears that this merge is into google:main which should have the updated files?

Yeah, something is off here — I'll try to untangle it and then merge the PR. It'll take me a day of two to get around to it :smile:

antoniolinhart commented 6 months ago

@mgeisler Should be good to go!

Great!

It appears that the diff base is not against the correct source so it's showing that I changed a lot of files that I obtained from rebasing on the updated code. I tried looking around but it appears that this merge is into google:main which should have the updated files?

Yeah, something is off here — I'll try to untangle it and then merge the PR. It'll take me a day of two to get around to it 😄

Thanks!! Also the clippy failure was due to me leaving some debugging logs in by accident (after the last time I ran clippy) so that should be fixed now :+1:

codecov-commenter commented 6 months ago

Codecov Report

Attention: Patch coverage is 91.91176% with 66 lines in your changes are missing coverage. Please review.

Project coverage is 90.58%. Comparing base (a4ca49b) to head (440e352). Report is 28 commits behind head on main.

Files Patch % Lines
i18n-helpers/src/lib.rs 83.66% 27 Missing and 6 partials :warning:
i18n-helpers/src/xgettext.rs 95.78% 2 Missing and 23 partials :warning:
i18n-helpers/src/bin/mdbook-xgettext.rs 0.00% 7 Missing :warning:
i18n-helpers/src/normalize.rs 88.88% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #77 +/- ## ========================================== - Coverage 90.62% 90.58% -0.05% ========================================== Files 11 12 +1 Lines 2368 3132 +764 Branches 2368 3132 +764 ========================================== + Hits 2146 2837 +691 - Misses 159 211 +52 - Partials 63 84 +21 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

mgeisler commented 6 months ago

Thanks!! Also the clippy failure was due to me leaving some debugging logs in by accident (after the last time I ran clippy) so that should be fixed now 👍

Hey Antonio, I think I got your changes back into a simple branch. I don't actually know if you did anything wrong here. What I did on my side was to

  1. Create a temporary branch at the point where your split-pot-file branch was.
  2. Hard reset split-pot-file to main to throw away all commits and merges and what not.
  3. Soft reset split-pot-file back to the temporary branch. This added all the changes back, but as a delta to main.
  4. Commit this under your name and force push back to your repository.

If the tests pass, then I'll merge this next.

antoniolinhart commented 6 months ago

Thanks!! Also the clippy failure was due to me leaving some debugging logs in by accident (after the last time I ran clippy) so that should be fixed now 👍

Hey Antonio, I think I got your changes back into a simple branch. I don't actually know if you did anything wrong here. What I did on my side was to

  1. Create a temporary branch at the point where your split-pot-file branch was.
  2. Hard reset split-pot-file to main to throw away all commits and merges and what not.
  3. Soft reset split-pot-file back to the temporary branch. This added all the changes back, but as a delta to main.
  4. Commit this under your name and force push back to your repository.

If the tests pass, then I'll merge this next.

Wonderful! Thank you I will try this out next time I run into the same issue.