littlebizzy / slickstack

Lightning-fast WordPress on Nginx
https://slickstack.io
GNU General Public License v3.0
640 stars 112 forks source link

SlickStack broken due to missing and/or empty (NULL) core scripts #90

Closed skilver-io closed 2 years ago

skilver-io commented 3 years ago

Hi @jessuppi,

I realized an hour ago that my slickstack and web shop is not running properly anymore. I am currently trying to figure out what the issue is but it might be related to slickstack. I have not done any changes to my WP instance for a while and issues started to come up at roughly 2 hours ago. I believe it has something to do with a cron job since the folders on my SFTP were manipulated at roughly the same time.

I am running the SS_BUILD="OCT2020G"

Issue:

  1. Website is loading blank
  2. PHP Fatal error in error.log: `Stack trace:

    0 /var/www/html/index.php(17): require()

    1 {main}

    thrown in /var/www/html/wp-blog-header.php on line 16 [14-Feb-2021 00:01:47 UTC] PHP Fatal error: Uncaught Error: Call to undefined function send_origin_headers() in /var/www/html/wp-admin/admin-ajax.php:25 Stack trace:

    0 {main}

    thrown in /var/www/html/wp-admin/admin-ajax.php on line 25 [14-Feb-2021 00:01:52 UTC] PHP Fatal error: Uncaught Error: Call to undefined function force_ssl_admin() in /var/www/html/wp-login.php:15 Stack trace:

    0 {main}

    thrown in /var/www/html/wp-login.php on line 15 [14-Feb-2021 00:02:52 UTC] PHP Fatal error: Uncaught Error: Call to undefined function wp() in /var/www/html/wp-blog-header.php:16`

  3. SFTP: I realized that the mu-plugin folder is not containing and plugins anymore

I tried to run ss-update but it's asking me to run ss-check due to an outdated ss-update file, but the command will not run.

  1. ss-check file is empty

I am literally clueless how to get my WordPress up and running. Do you have any idea how to resolve this issue?

Your help is gladly appreciated!

Best, Dennis

jessuppi commented 3 years ago

Hello @skilver-io thanks for reporting, that is severely outdated around 6 months and we've had tons of change since then as we are still in a sort of Beta stage, so lots of core features have been changing.

To keep things upgraded regularly simply run ss update or at least ss update config occassionally. You can now also schedule this to run automatically using the included cron jobs.

Anyway, the immediate fix in your case would be to manually download the contents of ss-check and ss-worker and then running your ss-update script again:

Ref: https://github.com/littlebizzy/slickstack/blob/master/bash/ss-check.txt Ref: https://github.com/littlebizzy/slickstack/blob/master/bash/ss-worker.txt

The larger problem here, which we recently patched, is that sometimes GitHub servers are overloaded and respond to wget queries with blank (null) file content, which eventually can break SlickStack configuration. We've addressed this by forcing the root crontab to retrieve ss core cron job files multiple times per day (this can't be disabled, currently).

skilver-io commented 3 years ago

Hi @jessuppi

Thank you for getting back to me so quickly and for your help in this matter.

The steps you mentioned resolved my issue:

  1. Manually updating content of ss-check
  2. Manually updating content of ss-worker
  3. Run ss-update

However, once I did the ss-update I noticed that the ufw.service is not starting. I digged through the GitHub issues and aware that there has been some issues in the past. Does the ss-update will reinstall/reconfigure the the UFW config files or is a ss-install necessar?

damiafaw commented 3 years ago

Mine broke this morning also - provides this when trying to access the URL, so seems a different issue to yours. My ss-config was only done in December....but basically it was redirecting my site to site_domain

This site can’t be reached Check if there is a typo in site_domain. If spelling is correct, try running windows network Diagnostics. DNS_PROBE_FINISHED_NXDOMAIN

When running ss-update, it reported @.. in the domain locations (non www and www field) and I manually entered the domain in and re-ran ss-update - but because the wp-config.php had updated recently, I assume it skipped and it didnt update the config in there and kept the @SITEDOMAIN entry

I also had issues with ufw stating it was unable to start, issues with 'input', couldnt even apt-get remove it either...did a purge, readded

I assume something was updated and it didnt like it I ended up doing a ss-install

Got it working now though, no data loss which was good :)

skilver-io commented 3 years ago

Hi @damiafaw

I closed this ticket earlier thinking the issues is resolved for everyone involved. @jessuppi opened it up again, he might have something to add?

Further, I became aware that my slickstack is not working again after I tried to login to wp-admin. There had been some issues with the wp-config.php and a missing or incorrect admin user. Therefore, I checked the ss-config but it was only an empty file so I ran the sudo bash /var/www/ss-install with my original settings again. My shop is in maintenance mode now. However, there seem to be some issues regarding the file structure and user permissions during the installation process, e.g.:

Running ss-perms-ubuntu-bash: Resets file and user permissions for Bash run commands and related files... chown: cannot dereference '/var/www/meta/wp-cli.yml': No such file or directory chown: cannot access '/home/sudo......./.wp-cli/config.yml': No such file or directory chown: cannot access '/root/.wp-cli/config.yml': No such file or directory chmod: cannot operate on dangling symlink '/var/www/meta/wp-cli.yml' grep: /tmp/ss-check: No such file or directory grep: /tmp/ss-worker: No such file or directory grep: : No such file or directory

Is there anything I can do at this point? @jessuppi Please let me know if there is anything you need in case you are able to resolve this issue on your end.

jessuppi commented 3 years ago

@skilver-io Sorry yes, just opening again as more people will probably have similar issues.

Please check if your ss-config and wp-config.php file contents are empty or not, if so you need to first rebuild ss-config using one of the backups in /var/www/meta/ or run the ss-install wizard again. And/or, you might just be able to rebuild your wp-config.php file instead (if the only problem) by running ss-install-wordpress-config again.

The grep errors (etc) you mention are temporary and unrelated.

damiafaw commented 3 years ago

@skilver-io Mine died again last night, lost DB connection approx 10-12 hours after bringing it back online This second time my ss-config was empty.

So I assume when a cron was run, wp-config.php file been updated and had empty values for quite a few entries because of it (DB, Host, Domain, etc) So I grabbed the sample and redid the information from scratch and ran ss-install

Good thing, didnt lose any data again which was nice. - just a fair amount of downtime for ecommerce site.

They have put a notice up on the sites now in relation to the wget issues and empty files. Might suggest looking for a string or the word slickstack in the files, making sure not empty, etc before running a cron job to remove running when t here is an empty file/timeout issue.

Personally I would prefer not to re-update important wordpress installation files like wp-config, etc files automatically and make it a manual run event as required from the ss-install-XXX files in the www directory.

I do have this issue since the last rebuild, but assuming its a plugin issue - just havent looked into it yet chmod(): Operation not permitted in /var/www/html/wp-admin/includes/class-wp-filesystem-direct.php on line 173

skilver-io commented 3 years ago

@jessuppi Thank you for your quick help here.

Unfortunately, none of the mentioned points did help. The wp-config.php and ss-config are looking good. I was running the ss-install again but my WP still response with a maintenance 503 error page for my site.

Does a ss-install resolves all issues? At this point I don't know where else to look. Edit: I noticed that the blacklist.txt file is empty now even though SS_PLUGIN_BLACKLIST="true". You mentioned in the readme that it is considered a core file for SlickStack. However, I would assume that does not have any impact.

@damiafaw for sharing your approach. Happy to hear that it could be resolved quickly for you. What build were your stack running on before the issues came up?

damiafaw commented 3 years ago

@skilver-io My original build date was December for the configuration file

You may have to delete the maintenance file like I did in the HTML directory.

I believe the page advises to remove it from memory.

skilver-io commented 3 years ago

@damiafaw

Yeah was about to say that. It solved the issue once I deleted the /var/www/html/maintenance.html file. Thanks^^

jessuppi commented 3 years ago

Might suggest looking for a string or the word slickstack in the files, making sure not empty, etc before running a cron job to remove running when t here is an empty file/timeout issue.

That is precisely what we've been adding actually, after wget retrieves a SS file from GitHub it will grep for the string SS_EOF that is located at the very bottom of every SS configuration file. If that phrase is not found, our scripts (should) now assume that the source file is corrupted and therefore exit from whatever install task it was performing.

Currently this process requires the following:

  1. Ensuring every relevant SS file/boilerplate on GitHub contains the SS_EOF string
  2. Ensuring every relevant SS script validates that string before installing
  3. Coming up with best practices/syntax/etc for how to best implement this using ss-functions and otherwise

The most difficult part of these changes, as always, is thinking through them logically to avoid conflicts, data loss, or other problems later on. Feedback from Linux/etc experts always welcome.

Personally I would prefer not to re-update important wordpress installation files like wp-config, etc files automatically and make it a manual run event as required from the ss-install-XXX files in the www directory.

You bring up a good point, and it's one of the main concerns I also have with what is too much or too little when it comes to configuration management and automation/maintenance. Ultimately we want SlickStack to be maintenance-free for those users who choose to have it be completely automated, but more hands-on for advanced users who want that.

damiafaw commented 3 years ago

I cant help you in relation to Linux, certainly not my forte haha Is it easier to see if a file is greater than a certain size (2k) before running a cron task, put the logic into only the 13 cron files you have at the top - check all files beginning with SS- for example....loop until true (might cause issues if connectivity problems?) Rather than looking for a string in every single file?

As I said, I dont have any idea for Linux, but just throwing ideas.

jessuppi commented 2 years ago

Over the last several months, the issue of wget timeouts seems to have greatly improved. This seems to be traced to a change we made to the wget alias function inside the ss-functions file:

## ss_wget ##
function ss_wget_slowest {
    command wget --no-check-certificate --no-cache --no-cookies --quiet --inet4-only --tries=30 --timeout=300 --waitretry=15 -O "$@"
}

Ref: https://github.com/littlebizzy/slickstack/blob/master/bash/ss-functions.txt

Previously, we tried to run wget faster, and ONLY in case file corruption was detected, SlickStack would try to run wget again more slowly and/or try a different mirror source, such as GitLab.

Well, it seems what helped was simplifying all that and just making wget run slower ALWAYS.

You can see timeout is now 300 which is much longer than before... also, this is strange to me, because many times before I noticed timeouts or failures within approximately 30 seconds before the timeout maximum had even been reached. I don't know why this happened previously, and I don't know why changing it to 300 fixed it so well, but it did. We also greatly increased the default number of tries to 30 times, which also might have helped.

This baffles me, I think something higher up the stack affects wget sometimes like OS and networking caches or something, because if a single timeout occurs sometimes it seems the entire slew of wget calls will auto-fail sometimes.

TLDR let's keep the very liberal timeout settings going forward.

Although the SS_EOF validation string work still needs better review project-wide, I think the NULL files problem that this Issue addressed can be considered resolved, but I will keep this open for a while longer...