Timing issue with 12.1.0.2 database box

PaulNeumann commented 5 years ago

(Please bear with me; this one takes some explanation.)

After the release of the most recent version of the ol7-latest base box on April 19, I began seeing an error on some Windows hosts when building the Oracle Database 12cR1 box. After the database installer files are unzipped, the runInstaller script fails, so the database software isn’t installed:

<snip everything before last installer file being unzipped>
    oracle-12102-vagrant:   inflating: /vagrant/database/install/.oui  
    oracle-12102-vagrant: 
    oracle-12102-vagrant: 2 archives were successfully processed.
    oracle-12102-vagrant: /vagrant/database/runInstaller: line 249: /vagrant/database/install/.oui: Permission denied
    oracle-12102-vagrant: /tmp/vagrant-shell: line 73: /opt/oracle/oraInventory/orainstRoot.sh: No such file or directory
    oracle-12102-vagrant: /tmp/vagrant-shell: line 74: /opt/oracle/product/12.1.0.2/dbhome_1/root.sh: No such file or directory
<snip rest of log>

The hosts this error occurs on are Windows 10 Professional Build 1809 and Windows 10 Enterprise Build 1803, with VirtualBox 6.0.6 and Vagrant 2.2.4. The affected hosts are fairly quick machines (Core i7, 32 GB RAM, SSD). The build worked successfully on these hosts with the previous version of the ol7-latest base box. The build still works successfully on an older, slower Windows 10 Professional 1809 host (Core i5, 8 GB RAM, HDD) running the same versions of VirtualBox, Vagrant and the ol7-latest box. This behavior is consistent and reproducible: the build fails on the faster hosts, and works on the slower host.

After some troubleshooting, I believe that the problem is a timing issue, not permissions. It may be specific to Windows hosts. For the 12cR1 box, the database installer files are unzipped to the host's filesystem (/vagrant/database). I think that the installer files aren't being fully flushed to disk before runInstaller starts. (The file that runInstaller can't access, .oui, is the last file that's unzipped.)

One of the items in the VirtualBox 6.0.6 changelog is "Linux guests: shared folder performance and reliability improvements and missing features". I suspect that the upgraded Guest Additions in the base box changed the timing just enough to introduce a problem with the 12cR1 build. If I add a sleep 0.2s statement to install.sh right after the unzip statement, the build succeeds on the same hosts. The slight delay seems to allow enough time for all of the installer files to be flushed to disk. (sleep 0.1s wasn't enough; the build still failed on the faster hosts.)

Adding a sleep command seems to fix the immediate issue, but it's not very elegant or robust. With an even faster host, or an anti-virus application locking one of the installer files, the build could still fail.

I think a better solution would be to unzip the installer files to /tmp, instead of /vagrant. This would take the host filesystem out of the picture. There's enough space available in the VM, and the installer files are deleted after use. I've tested this, and it seems to work well. Unzipping to and installing from /tmp also seems to be a little faster.

Even though I've seen this issue only with the 12cR1 box, I suggest modifying the install.sh scripts for the 11gXE, 12cR1 and 12cR2 boxes to unzip the database installer files to /tmp, instead of /vagrant. (These are the only database boxes that extract the installer files to the host's filesystem.) This would keep the boxes consistent, and would reduce the chance of similar problems in the future.

I hope this makes sense. Please let me know if you need any more information. If you agree with this approach, and you'd like me to work on it, I'd be happy to.

Thanks, Paul

gvenzl commented 5 years ago

@scoter-oracle, is this something that sounds familiar to you?

@PaulNeumann, I think originally we didn't want to extract to /tmp to keep the VM file as small as possible. I.e. extract the binaries outside the VM and just install them into it. With 18c and onwards this has luckily changed already anyway. I don't mind changing the behavior of the older versions and extract into /tmp. But before let's see whether Simon already knows about this, a fix might just be around the corner or is already delivered.

scoter-oracle commented 5 years ago

Do we know it it happens also on other host OSes with quick machines ? Just to understand if it's an issue related only to Windows-10.

PaulNeumann commented 5 years ago

@gvenzl I understand. Unzipping to /vagrant to minimize the size of the VM files makes sense. Unfortunately, the Windows filesystem is challenging sometimes. This kind of behavior is all too common, and I've written a ridiculous number of timing loops over the years to work around it.

@scoter-oracle I don't have access to other host OSes to check whether this issue occurs elsewhere. My guess would be that it's limited to Windows. I suspect that a lot of people run these boxes on Windows, though.

I really appreciate your attention to this. If you decide that you want to use /tmp going forward, I'm happy to submit a PR. I have the scripts done and tested.

Thanks, Paul

billygoat747 commented 5 years ago

I too am running into this. I'm on WIndows 10.

Sandalorian commented 5 years ago

Just hit this issue too, my system is as follows: Processor Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz, 2208 Mhz, 6 Core(s), 12 Logical Processor(s) Installed Physical Memory (RAM) 32.0 GB OS Name Microsoft Windows 10 Pro Version 10.0.17763 Build 17763

I added sleep 10 between the unzip and cp /vagrant/ora-response/db_install.rsp.tmpl /vagrant/ora-response/db_install.rsp as a work around

gvenzl commented 5 years ago

Merged!

oracle / vagrant-projects

Timing issue with 12.1.0.2 database box #128