polylang / wpml-to-polylang

Tool to migrate a WPML install to Polylang
https://wordpress.org/plugins/wpml-to-polylang/
GNU General Public License v3.0
6 stars 0 forks source link

refactor!: Code optimizations, paginating queries, and better memory management #13

Closed jpSimkins closed 1 year ago

jpSimkins commented 2 years ago

This is a refactor of this plugin.

This was tested with a system that has over 1 million translations and imports in less than 5 mins. (memory was set to 4GB for WP_MAX_MEMORY_LIMIT).

Could be improved:

jpSimkins commented 2 years ago

The import process is still not 100% complete.

I am seeing:

There are posts, pages, categories or tags without language. You can set them all to the default language.

This is most likely due to version differences between WPML and PolyLang as this was the same case for it in it's current state. I will look into what is being missed and why next. I should note, all my data seems fine.

It's also asking me to run the setup wizard too. Not sure if that is expected still.

manooweb commented 2 years ago

Hello @jpSimkins,

First, sorry for the delay but we were in Porto at the latest WordCamp Europe for a few days. Second, thank you very much for this work and sorry because this repository wasn't really ready to contribute by respecting our guidelines like for the Polylang repository for example. As you could see we didn't code here since a while.

So before reviewing in details and merging your work, may I ask you for some changes?

We also use PHPStan tool for code static analysis but our old code can be improved and PHPStan already finds many errors at the beginning . So don't take car about that.

On my side, I will make some changes on our current master branch to make it ready to contribute.

Thanks again. Best regards.

manooweb commented 2 years ago

The import process is still not 100% complete.

I am seeing:

There are posts, pages, categories or tags without language. You can set them all to the default language.

This is most likely due to version differences between WPML and PolyLang as this was the same case for it in it's current state. I will look into what is being missed and why next. I should note, all my data seems fine.

In fact, it always possible that some contents are translatable but have no languages assigned. It can be post, page, custom post type, category, tag, custom taxonomies and so on. This is why this notice is displayed. It can depend on how WPML contents was managed. It's difficult to answer when there is a lot of contents like in your case or without a detailed example.

It's also asking me to run the setup wizard too. Not sure if that is expected still.

The wizard is managed by the Polylang core and, if I correctly remember, it is launched when no Polyang options is defined. It's therefore normal the first time you activate Polylang. Polylang options are normally created after WPML To Polylang import was run. So the wizard shouldn't be launched automatically. Is it this case you talked about?

jpSimkins commented 2 years ago

Hello @manooweb,

No worries about the delay. I made the suggested changes per your request.

As for the follow-ups:

That is what I am seeing after I run the WPML-To-PolyLang importer (even before the refactor).

This could be in part to WPML element_type (in icl_translations table) having package_* which doesn't seem to be handled by the importer. This is where I am focusing on for now. Although I do not understand PolyLang's structure yet for translations so I am not sure if I will be able to resolve this part.

For my setup, I am seeing package_elementor and package_gutenberg so these are not being translated.

As for the wizard showing still, that was after the import was completed. I confirmed I do have polylang in the options table.

One more thing I am trying to identify, obviously I will need to purchase the pro version for my needs (plus I use WooCommerce). I assume this importer works just fine with PolyLang Pro, is that correct?

jpSimkins commented 2 years ago

I just tested again with the importer to verify no issues were missed in the updates I did earlier.

After the importer is complete, when I go to the dashboard, I am seeing:

Welcome to Polylang โ€“ Youโ€˜re almost ready to translate your contents!

It wants me to run the setup wizard so I ran it. It does appear to be due to content missing translations. Given the amount of data I have and that I am behind a proxy, I get the 504 error due to the proxy timeout. I waited a few mins then refreshed the page and it seems all is well.

I did compare the options to each other and they match completely. So I don't think it's the options that shows the wizard. It must be checking for untranslated content.

Ideally, I would like the importer to solve this issue. I'll try to look deeper into this to understand it more so I can find a solution.

jpSimkins commented 2 years ago

After looking though the code in PolyLang, the wizard is shown unless you dismiss it.

I added \PLL_Admin_Notices::dismiss( 'wizard' ); to make sure the notice is dismissed properly. Now, after the import process is complete, the wizard no long nags the user.

I'm unsure of how to handle the package_* element_type from WPML's icl_translations. I am not sure if I even need to handle those. Not sure I will be able to figure this out so hopefully someone from your team can assess this part.

Other than that, everything looks good on my end. Translations are valid and everything seems pulled in from WPML without issue.

Also, just a side note, I am running WordPress 5.9.1 and using WPML 4.5.5.

manooweb commented 2 years ago

Hello @jpSimkins ๐Ÿ‘‹

No worries about the delay. I made the suggested changes per your request.

Thanks ๐Ÿ‘

One more thing I am trying to identify, obviously I will need to purchase the pro version for my needs (plus I use WooCommerce). I assume this importer works just fine with PolyLang Pro, is that correct?

Sure! Polylang is in fact the Polylang Pro core and the language is managed by Polylang.

manooweb commented 2 years ago

After looking though the code in PolyLang, the wizard is shown unless you dismiss it.

I added \PLL_Admin_Notices::dismiss( 'wizard' ); to make sure the notice is dismissed properly. Now, after the import process is complete, the wizard no long nags the user.

๐Ÿ‘ I believe I misunderstood something in your previous comment. I thought that the wizard was launched automatically as when we activate Polylang the first time. The notice you talked about is also dismissed by the wizard itself when we go until its last step. It make sense here to use the same way when we migrate content from WPML.

I'm unsure of how to handle the package_* element_type from WPML's icl_translations. I am not sure if I even need to handle those. Not sure I will be able to figure this out so hopefully someone from your team can assess this part.

It seems to me that they are custom post types. They are not necessarily translatable and we need to tell to Polylang if they are or not. It can explains this behaviour.

Other than that, everything looks good on my end. Translations are valid and everything seems pulled in from WPML without issue.

๐Ÿ‘ Your work is interesting and we will take time to study, test and finally merge after all the current work.

Also, just a side note, I am running WordPress 5.9.1 and using WPML 4.5.5.

jpSimkins commented 2 years ago

It seems I missed paginating string translations. I pushed an update to solve this issue.

I also notice that string translations are not working properly (only showing 27 out of over 500,000). The serialized data in the DB is not valid. When I copy it out and try to unserialize it, it is not valid and doesn't unsearialize. I am trying to figure out why this is.

I should note that it appears to break on HTML as it doesn't seem to be escaped so this is most likely a PolyLang issue itself. I am not sure exactly why this is happening but until I can figure this out, I cannot switch to PolyLang. I am still trying to understand this part of the system so hopefully I can figure out why this is the case. Doesn't appear to be an issue with the importer but PolyLang itself (assuming the MO system).

Of course, having HTML in a string translation itself is bad... this is due to other plugins and is still something I need the multilingual plugin to deal with. (GDPR Cookie Consent is an example)

As for the package_* issue I posted above. I think these need to be imported too. I have 2:

  1. package_gutenberg
  2. package_elementor

Elementor is a plugin but Gutenberg is core WP so not really sure what this is about here. Seems like this should be imported but since I am new to PolyLang, I really don't know where this should be stored.

jpSimkins commented 2 years ago

Further investigation shows this was due to WPML scanning the plugins/themes to import the string translations. Not sure why these did not import but by using Theme and plugin translation for Polylang (TTfP) I was able to get what I wanted. Just had to redo the translations. Again, I don't see why this failed as the data looked good until it went to the MO system of Polylang.

jpSimkins commented 2 years ago

Pushed up a fix for taxonomy term translations not being associated properly.

Chouby commented 2 years ago

@jpSimkins

Thank you for your very interesting work.

You should consider the current status of WPML to Polylang as a MVP to migrate the data. However, it helped most of the people wanting to migrate to Polylang during the past years while being difficult or even impossible to use when pushing the limits to the extreme as you experienced it.

For me, known issues of the current version are mainly:

  1. The size limit of the MYSQL queries.
  2. The memory size limit.
  3. The timeout limit.

Your PR fixes the point 1, not the points 2 and 3. However, your work motivated me to go one step beyond and fix all 3 issues at the same time.

So I took some time last week to work on this, reusing part of your work and ideas that you included in this PR.

My goal is to split the migration in several processes instead of only one, all working within the limits identified above.

Thus I have splitted the migration in several actions:

  1. Create languages
  2. Assign post languages and translations (*)
  3. Assign term languages and translations (*)
  4. Assign menu locations
  5. Process post and terms with language (*)
  6. Create strings translations (*)
  7. Migrate options

The actions marked by an asterisk may still go beyond reasonable limits. So these actions must be splitted themselves in several processes (called steps) if necessary.

My goal is to process actions (and their steps if any) with an ajax batch system: The user launches the first action, which launches the second step in ajax, which to its turn lanches the third action and so on until the end.

I prefer not to use the CRON as you are proposing. Indeed, after I tested your PR, I thought to a user testing the migration locally or on a staging site. In such case, no outside process will launch the CRON task just after having pushed the Import button. We both know what to do. But I fear that people feel perturbated seeing nothing happen. Of course the next action on the site would launch the process but the first impression may leave a strange feeling.

As I aim for a complete rewrite, I also adopted a more modern structure with OOP as you did. The main idea is to have one class per action. This is where the logic takes place and where I reused the way you limited the size of the SQL queries. However I had to rework some of them, especially for post and term translations as I needed to make sure to process each complete translation group in the same step. I also took your idea to offer a status. However, giving that I was writing a class per action, it was clear to me that each action class should be responsible to provide its own status. As the processes are limited in size, we don't have to use set_timeout() anymore. The memory is freed by the end of the process. I however still have to find out what would be the "best" value for WPML_TO_POLYLANG_QUERY_BATCH_SIZE. I do not plan to allow legacy submission as you proposed. If the new way is better than the old way, there should be no need to keep it.

It will be also time to rewrite how checks before the import are reported to account for feedbacks of users I had in the past years not always understanding the meaning of these checks.

Working on this plugin was not in our goals for this year, so I'm working on this on my free time (I had some last week). The current state of my work is visible in #19 I should be quite close to the initial goal on the PHP side. Nothing's tested so it's probably very buggy in its current state, but the code should be quite readable and should be ok to illustrate the goals I exposed above. I still have to write the JS part.

Nothing's ready to test yet but after all the work you did, I wanted to expose you my plans for this plugin as soon as I had clear views.

jpSimkins commented 2 years ago

No worries @Chouby. I'm glad it motivated you.

I was considering adding a link to trigger the cron but decided against it.

My 2 cents on this would be that a one-off import is expected to need higher resources. It's easier to increase system resources for PHP than to have the users edit config files for MySQL, so that's the route I took. Memory and timeout limit aren't that important to worry about for this import system (I imported over a million datasets within 3-5 mins using 3GB of memory but I really only needed 2GB) but if you can solve the issue without sacrificing the speed then that's still a good idea. Tuning the import to take 20 mins to prevent system resource needs isn't worth it imho. Also, don't forget proxy timeouts since the main issue I had was that after 30 seconds I would get a gateway timeout.

Given that this script it aimed for anyone, your changes make sense and is a good direction.

What I really wanted to do was try to get the DB to handle the changes without needing PHP. I just didn't know enough about Polylang to figure that part out. Plus, some parts that just doesn't seem possible.

I look forward to your work and if I still have a need for it, I will check it out. I wouldn't mind learning some things from this.

This was my first time playing with Polylang so your script was a life saver and I appreciate all the work that was done on it.

Chouby commented 1 year ago

@jpSimkins I just merged my complete rewrite of the plugin after it has been tested internally.

If you still have your big site and would like to test it on a test install, your feedback would be very appreciated. The size of the query is controlled byt the constant WPML_TO_POLYLANG_QUERY_BATCH_SIZE. I selected a default size of 5000 which according to my tests require low memory usage and did not lead to a significant performant loss. But the biggest site I have at my disposal has only about 10,000 posts.

jpSimkins commented 1 year ago

I actually should be able to test this next week. I will let you know how it goes once I can get to it

jpSimkins commented 1 year ago

@Chouby I wound up sticking to my code. The issue with your update is that in an autoscaling environment, this actually shut down the node processing the tasks (cron runs outside the auto scaling group). This is typical for auto scaling as you don't get to control which nodes get shut down. Due to the risks this causes, I had to keep my code since it runs as cron. Perhaps add an option to allow this to run as cron?