turnkeylinux / tracker

TurnKey Linux Tracker
https://www.turnkeylinux.org
70 stars 16 forks source link

LXC appliance: Can't change network address using confconsole #148

Open Dude4Linux opened 10 years ago

Dude4Linux commented 10 years ago

After installing the iso version of LXC, try to change the appliance IP address using confconsole. All attempts result in the error: "refusing to write to /etc/network/interfaces header not found: # UNCONFIGURED INTERFACES" The appliance has picked up an address via DHCP, but I would like to change it to a static IP.

# cat interfaces 
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet manual

auto br0
iface br0 inet dhcp
    bridge_ports eth0
    bridge_fd 0
    bridge_maxwait 0

auto natbr0
iface natbr0 inet static
    bridge_fd 0
    bridge_maxwait 0
    address 192.168.121.1
    netmask 255.255.255.0
    pre-up brctl addbr natbr0
    post-up /etc/init.d/dnsmasq restart
    post-down brctl delbr natbr0

Comparing to other appliances, the two lines below appear to be missing.

# UNCONFIGURED INTERFACES
# remove the above line if you edit this file

Adding the two lines restores normal operation.

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

alonswartz commented 10 years ago

That was actually intentional for the first release. The networking setup is complex and we decided that without sufficient testing it was better to remove "UNCONFIGURED INTERFACES" than introduce possible bugs.

If you could help in testing, making sure there are no issues, I'd be happy to update the code.

Dude4Linux commented 10 years ago

I figured that out after forking lxc and getting a closer look at the code. I can see the issue is that confconsole doesn't know how to handle the br0 interface, where the static IP needs to be assigned. I thought about working on on it, but confconsole is a core tool and I'm just starting to learn Python. Don't want to muck up things too badly. I have set a static IP in my DHCP server as a workaround for now. When you get a chance to work on a revised confconsole, I'd be more than happy to help test it. Thanks for all you guys do and Happy New Year.

Dude4Linux commented 10 years ago

Note: I tried to post the following in a comment on the main website, but was blocked by the Spam filter. LXC Roadmap?

I'm very excited about the LXC app. Last year I wasted three months trying to learn OpenStack to build the foundation for a small business development environment. After some frustration I came to the conclusion that OpenStack wasn't (then) quite ready, i.e. poor security etc. I then turned to ProxMox, and over a weekend, was able to get my server setup with virtualization. Lately, however, I've been concerned about changes to the ProxMox support license. I also have a client (my church) with an older server that won't support VT and ProxMox. I already have the TurnKey file app running there and have been adding custom packages for Ubiquiti's Unifi WiFi controller. Now I'm thinking that I should switch to the LXC app and run the file app and WiFi controller in containers. I'm already seeing the need to load additional apps but didn't want to have to buy another server just to run ProxMox.

I'm curious about your future plans for the LXC appliance. I admit that I've gotten spoiled by the ProxMox gui and frustrated by its java-console. I've already tried unsuccessfully to install LXC Web Panel (http://lxc-webpanel.github.io) on the TurnKey LXC app. When they say they don't support Debian, they mean it. Another possibility is the community edition of OpenQRM (http://www.openqrm-enterprise.com). Have you looked at these or other candidates for a web front end for LXC?

ProxMox's use of a java based virtual console has been exceedingly frustrating. They are switching to SPICE, but that requires a client on the user device, something else I'm not happy about. The one project I've found, so far, that looks like what I want is Guacamole - HTML5 Clientless Remote Desktop (http://guac-dev.org). If you have work underway, I'll wait to see what happens, otherwise I'm willing to try to add one or both projects to the LXC app.

Did I mention I was excited about the LXC app? Over the holidays, I pushed the beta1 version of a TurnKey Ansible app (https://github.com/Dude4Linux/ansible). I couldn't figure out how to initiate a pull-request for a new appliance, so consider this the announcement. If anyone wants to try this, but doesn't have a TKLdev setup, let me know and I'll post an iso.

I got interested in CI when I attended a session, DIY Continuous Integration, by Allan Chappell @general_redneck, generalredneck.com at last August's DrupalCorn Camp. Then Mike Minecki of Four Kitchens tipped me off to Ansible, a radically simple competitor to Puppet and Chef. I'm still feeling my way along learning CI and how the pieces fit together. Ansible (and Puppet and Chef) typically work with Vagrant and VirtualBox to create and provision virtual hosts for testing or deployment. Vagrant now has the ability to work with LXC containers in addition to VMware and VirtualBox. It may also be possible to bypass Vagrant by creating an Ansible module for the LXC app to allow automated creation and destruction of test containers (CTs). The GitLab and Jenkins apps also have a role in CI. I debated whether it was better to put all the CI applications together in a single appliance, or keep them separate. Now with the release of the LXC app, I'm convinced that separate is the way to go, keeping each application in a container and the overhead of doing so is minimal. I would like to see TurnKey apps become aware of one-another i.e. launch an Ansible container, and it would find the LXC, GitLab, and Jenkins containers and configure itself accordingly. Anyone else think this would be a good idea?

Tim has a good question about Webmin. Fortunately, there is a GPL version of Virtualmin which could be integrated into the LXC appliance and configured to act as a front-end for all the Webmin modules running in the containers. I guess it's up to Alon and Liraz to decide when/if they want to tackle this.

JedMeister commented 7 years ago

What about just disabling the networking component of confconsole for the LXC appliance for now? I know it's not ideal, but at least then it's not there, broken?!

maphew commented 5 years ago

I've hit the error in opening post when using LXC Container ISO on bare metal, both v14.2 and v15. For me the fix is to

a) switch to a different console and login (Alt-F1)

b) edit /etc/network/interfaces and change manual to dhcp

auto eth0
iface eth0 inet manual

...followed by running:

ifup -a

After this ping will reach other hosts in the network, but only by ip. DNS resolution fails. Still, it's enough to be able to connect to webmin and use ssh. One major downside is that ifup -a must be run after every reboot.

maphew commented 5 years ago

Correction: my recipe doesn't work in Turnkey LXC v15. I don't think it was sufficient on its in own v14.2 either. I tried a lot of different things in troubleshooting and I think I didn't write all the essentials down.

I misread the first post. Inserting the text # UNCONFIGURED INTERFACES as first line in /etc/network/interfaces fixed the "refusing to write to /etc/network/interfaces" error. confconsole now runs without complaint. Changing /etc/network/interfaces to dhcp and ifup aren't needed.

Dude4Linux commented 5 years ago

@maphew - It looks like this issue was overlooked during the last update of confconsole. My work-around has always been to force a static ip address assignment in the upstream dhcp server (dnsmasq). If you don't control the upstream dhcp service, this presents a problem. @JedMeister - It's been a long time since I last looked at this, but I seem to recall that confconsole was confused by the presence of the br0 and natbr0 interfaces. If it could just ignore any additional interfaces it should be able to manage the eth0 interface configuration. EDIT: Just confirmed that confconsole can't just ignore br0 and natbr0.

Dude4Linux commented 5 years ago

I may have spoken too soon. @maphew - which version of confconsole are you using?
I don't currently have a way to test on bare metal or in a vm, but I have LXC v15.0 running in an LXD container in Ubuntu.

# dpkg -l confconsole
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                     Version           Architecture      Description
+++-========================-=================-=================-======================================================
ii  confconsole              1.1.0             all               TurnKey GNU/Linux Configuration Console

My /etc/network/interfaces looks like this

auto lo
iface lo inet loopback

auto eth0
iface eth0 inet manual

auto br0
iface br0 inet dhcp
    bridge_ports eth0
    bridge_fd 0
    bridge_maxwait 0
hostname lxc-15

auto natbr0
iface natbr0 inet static
    bridge_ports none
    bridge_fd 0
    bridge_maxwait 0
    address 192.168.121.1
    netmask 255.255.255.0

The 1.1.0 version of confconsole seems to be working and recognizes that br0 is configured by dhcp. screenshot_060

Dude4Linux commented 5 years ago

Further testing showed that if I select StaticIP and try to modify the configuration, I get the error message described in the initial report. refusing to write to /etc/network/interfaces header not found: # UNCONFIGURED INTERFACES. If I add the line # UNCONFIGURED INTERFACES to /etc/network/interfaces and then use confconsole to create a StaticIP entry, confconsole will run but it generates some errors. After killing confconsole and restarting the container the contents of /etc/network/interfaces looks like this

# UNCONFIGURED INTERFACES
# remove the above line if you edit this file

auto lo
iface lo inet loopback

auto br0
iface br0 inet static
    address 10.76.85.67
    netmask 255.255.255.0
    gateway 10.76.85.1
    dns-nameservers 10.76.85.1 8.8.8.8 8.8.4.4

auto natbr0
iface natbr0 inet static
    bridge_ports none
    bridge_fd 0
    bridge_maxwait 0
    address 192.168.121.1
    netmask 255.255.255.0

auto eth0
iface eth0 inet manual

Note that three lines have been removed from the br0 interface, e.g.

    bridge_ports eth0
    bridge_fd 0
    bridge_maxwait 0

I think that these may be the source of the errors displayed by confconsole. Also their removal might have a negative impact on the LXC appliance.

JedMeister commented 5 years ago

@Dude4Linux - as per always, thanks tons for your input! :+1:

Yes this was missed for v15.0. The v15.0 release took so long and we got so bogged down, that I had to draw a line in the sand. I didn't update the tracker though. I've just updated this issue, but I'll need to go through the rest of the remaining v15.0 issues as part of the final v15.0 mop up.

From here on in, whilst I'll still use the milestones to track progress of individual appliances, it won't always make complete sense as each appliance may be on a different version...

Regarding this specific issue, we'll certainly need to do something... I think that we may need to make the networking component of confconsole a bit smarter to deal with the complexity of the LXC networking config. Essentially, the problem is that the netowrking component of confconsole predates the LXC appliance by a significant margin and makes many assumptions which are not relevant to, nor compatible with the LXC appliance network config.

In the short term, we could work around the issue simply by disabling the network component of confconsole for the LXC appliance. I.e. set networking false in the confconsole config

Do you think that may be a good idea for the next release (for now)?

In the longer term, confconsole probably needs a rewrite. It was originally written for python2.4 (now running on 2.7) and leverages a really old version of python-dialog/dialog-wrapper (which we package ourselves to keep it running). We did look at updating it to work with the current Debian packages of python-dialog/dialog-wrapper but things have changed so much. It'd possibly be easier to rewrite it from scratch (but that's a pretty big undertaking)...

Dude4Linux commented 5 years ago

@JedMeister - In Nov. 2015, an LXC specific change was made to confconsole. I'm not sure exactly what the change was trying to achieve. I'm a long way from a decent python programmer, but I'm trying to learn the basics. I forked a copy of confconsole and have been looking into ifutils.py. It seems to me that what is needed is an additional function to handle the bridge_ options similar to how the pre-up, up, post-up options are handled. If I can get a modified version of confconsole to handle the bridge options, I'll issue a PR. I don't see confconsole needing to display or modify the values, only needs to preserve them when the /etc/network/interfaces file is written. I'm on the road ATM and WiFi connectivity is marginal. Serious work will have to wait a couple of weeks.

JedMeister commented 5 years ago

@Dude4Linux - ok great thanks. TBH, I'm not much of a python programmer either, but am learning a bit too. If you can get something working, that'd be awesome! :smile:

Dude4Linux commented 5 years ago

@maphew - Would you be willing to help me test a possible solution? If so, then make a backup copy of /usr/lib/confconsole/ifutil.py and then download a new copy of ifutil.py from here: https://github.com/Dude4Linux/confconsole/raw/update-confconsole-for-v15_1/ifutil.py Let us know it if works in your bare metal environment, and that you can switch between static and dhcp successfully. TIA John

Dude4Linux commented 5 years ago

@JedMeister - I've issued a new PR that should close this issue. Update confconsole for v15.1 #25

JedMeister commented 5 years ago

Great work @Dude4Linux - looks good on face value. I'll get @OnGle to do some testing and will aim to merge ASAP.

maphew commented 5 years ago

Yes, with the changed ifutil.py I can now use confconsole to switch between static and dhcp addressing. There was an initial hiccup: the first couple attempts of DHCP-->Static generated a 'file exists' error from ifup on br0. After this gateway was set to None.

I then set to DHCP, exited and restarted confconsole, and am now able to go back and forth between the two without error.

One curiosity remains: when using Static address, dns lookup for Windows machines in the internal network fails. So I'm using dhcp in LXC and have the router set to always assign the same ip to it's MAC address.

maphew commented 5 years ago

After reboot networking is broken. Confconsole is caught in a loop trying to remove bridged interface from list, which errors because it's not present. Quitting confconsole and running ifup br0 reports no such device.

I'm no longer a good test subject. I decided to try installing Proxmox VE in place which likely changes too many variables to be reliable. (The install went fine. At one point I was asked to choose between vendor supplied /etc/lxc.conf or the existing one. I merged the two, keeping the "use natbridge by default line". Later I removed the single line added by vendor, but it didn't change the networking error.)

I think the error may not be Proxmox related though because of the similar "not found" messages when I first tried the patched ifutil.py

Dude4Linux commented 5 years ago

After reboot networking is broken. Confconsole is caught in a loop trying to remove bridged interface from list, which errors because it's not present. Quitting confconsole and running ifup br0 reports no such device.

This looping error sounds like the behavior when the modified ifutil.py is not is not installed. Did I understand correctly that you tested this on bare metal before attempting to install Proxmox VE, and that it was working before you rebooted?
I'm no longer a good test subject. I decided to try installing Proxmox VE in place which likely changes too many variables to be reliable. (The install went fine. At one point I was asked to choose between vendor supplied /etc/lxc.conf or the existing one. I merged the two, keeping the "use natbridge by default line". Later I removed the single line added by vendor, but it didn't change the networking error.)

I think the error may not be Proxmox related though because of the similar "not found" messages when I first tried the patched ifutil.py

I'm also using Proxmox VE, albeit an older version, on my home server. Keep in mind that Proxmox uses LXC for containers so trying to install the LXC appliance in a Proxmox container (CT) would be a case of nested containers which won't work as noted in the documentation. I run LXC installed from the .iso in a Proxmox virtual machine (VM). As I recall, the Proxmox installation of LXC into a container will complete and seem to work, but that containers created in LXC will not actually function correctly.

Dude4Linux commented 5 years ago

@maphew - One other thing I just noticed while retesting in Proxmox's webconsole is that you have to be careful to restart confconsole after updating the three files

/etc/network/interfaces
/usr/lib/confconsole/ifutil.py
/usr/lib/confconsole/plugins.d/System_Settings/hostname.py

If you leave confconsole running when the files are updated, it won't load (import) the modified files and so will continue to fail in the loop. I think this might be why some of your tests failed initially but then worked. I still can't explain why your test failed after reboot; my test on Proxmox continued to work after the reboot.

dnoach commented 4 years ago

@Dude4Linux - I keep running to this issue where trying to set my capture interface to sniffer mode result with broken pipe and lock me out of the box. When running healthcheck from confconsole I see it tries bond0 interface which shows both RX and TX buffers zeroed. Any idea what is the issue?

JedMeister commented 4 years ago

@dnoach - could you please clarify what you mean?! I assume that you mean the host LXC appliance interface? What do you mean by "healthcheck from confconsole"?

Also, FWIW @Dude4Linux's work was never merged into Confconsole. I had intended to do some extensive testing and include it, but other stuff always seemed to get in the way... :cry:

We've now ported Confconsole to python3 for the (upcoming and well overdue) v16.0 release (see blog post with v16.0rc download links for Core and TKLDev). So this will need to be revisited against the current codebase...

JedMeister commented 4 years ago

As per #1520 LXC has been deferred for now...