spring-cloud / spring-cloud-netflix

Integration with Netflix OSS components
http://cloud.spring.io/spring-cloud-netflix/
Apache License 2.0
4.87k stars 2.44k forks source link

EIP publicip association not correctly updated on fresh instance #1321

Open nick-pww opened 8 years ago

nick-pww commented 8 years ago

I've been directed over here from the eureka folks, as they believe this should just 'work'. Have the following issue running off spring-cloud-netflix:1.1.4.RELEASE. The issue I opened over there is: https://github.com/Netflix/eureka/issues/840

There seems to be a problem with public EIP address association not being correctly updated when a new AWS server starts and has a new Eureka server starting with it. When the server starts up, it correctly registers itself:

2016-09-06 15:55:29.040  WARN 3399 --- [Thread-10] com.netflix.eureka.aws.EIPManager        : The selected EIP 54.67.102.122 is associated with another instance i-0666b391 according to AWS, hence skipping this
2016-09-06 15:55:29.628  INFO 3399 --- [Thread-10] com.netflix.eureka.aws.EIPManager        :

Associated i-25f11391 running in zone: us-west-1c to elastic IP: X.X.X.X

But, every minute after that we get the following log entry:

2016-09-06 16:24:55.568  INFO 3399 --- [Eureka-EIPBinder] c.n.e.r.PeerAwareInstanceRegistryImpl    : Got 1 instances from neighboring DS node
2016-09-06 16:24:55.568  INFO 3399 --- [Eureka-EIPBinder] c.n.e.r.PeerAwareInstanceRegistryImpl    : Renew threshold is: 1
2016-09-06 16:24:55.568  INFO 3399 --- [Eureka-EIPBinder] c.n.e.r.PeerAwareInstanceRegistryImpl    : Priming AWS connections for all replicas..
2016-09-06 16:24:55.568  INFO 3399 --- [Eureka-EIPBinder] c.n.e.r.PeerAwareInstanceRegistryImpl    : No peers needed to prime.
2016-09-06 16:24:55.568  INFO 3399 --- [Eureka-EIPBinder] c.n.e.r.PeerAwareInstanceRegistryImpl    : Changing status to UP
2016-09-06 16:24:55.713  WARN 3399 --- [Eureka-EIPBinder] com.netflix.eureka.aws.EIPManager        : The selected EIP X.X.X.X is associated with another instance i-0666b391 according to AWS, hence skipping this
2016-09-06 16:24:55.804  INFO 3399 --- [Eureka-EIPBinder] com.netflix.eureka.aws.EIPManager        : My instance i-25f11391 seems to be already associated with the EIP X.X.X.X

Debugging this, the call to isEIPBound() is always failing, and this is because the following is always null:

String myPublicIP = ((AmazonInfo) myInfo.getDataCenterInfo()).get(MetaDataKey.publicIpv4);

It looks like there is stale datacenterinfo and it never gets refreshed (from what I can tell) and there there are no settings available to have it refreshed automatically.

The odd side affect of this, and we noticed, is that the registry continually gets wiped, and reset causing obvious potential issues down stream for our clients.

I have been trying to find where this datacenter info might be refreshed, but am unable to find anything that might actually do that.

The deployed app only has a single main class in it:

@SpringBootApplication
@EnableEurekaServer
@EnableAutoConfiguration
public class EurekaServer {

    @Value("${server.port}")
    private Integer nonSecurePort;
    @Autowired
    private InetUtils utils;

    public static void main(String[] args) {
        new SpringApplicationBuilder(EurekaServer.class).web(true).run(args);
    }

    @Bean
    @Profile("aws")
    public EurekaInstanceConfigBean awsEurekaConfig() {
        EurekaInstanceConfigBean b = new EurekaInstanceConfigBean(utils);
        b.setNonSecurePort(nonSecurePort);
        b.setSecurePortEnabled(false);
        AmazonInfo info = AmazonInfo.Builder.newBuilder().autoBuild("eureka");
        b.setDataCenterInfo(info);
        return b;
    }

}
spencergibb commented 8 years ago

Interesting. I assume this is running on AWS? What is the configuration?

nick-pww commented 8 years ago

Yes, running on AWS. Here are the relevant configs (coming from spring-cloud config server): Global config for all apps:

eureka.instance.leaseRenewalIntervalInSeconds=30
eureka.client.healthcheck.enabled=true
eureka.datacenter=cloud

Config for just the server apps:

eureka:
    client:
        registerWithEureka: false
        fetchRegistry: false

And servers have:

eureka.client.serviceUrl.defaultZone=....

setup as well with the relevant EIPs assigned.

qiangdavidliu commented 8 years ago

@nick-pww I just noticed your config. The thread that DiscoveryClient uses to refresh local instanceInfo (and hence datacenterInfo) is only started if registerWithEureka is true (it tries to save the extra cpu resource if registration is not configured). Is there a reason you are configured with register = false?

nick-pww commented 8 years ago

@qiangdavidliu Going off several examples and docs. One of which is here: https://spring.io/guides/gs/service-registration-and-discovery/

I can turn that off, but one problem I had before that with that and 'fetchRegistry' on was that the servers were essentially always 'registering' applications even if they were no longer up because it was getting info from the other eureka servers. Basically, applications would never unregister, and if they did, they had a good chance of coming back when the servers synced again.

Also, I've read in other places that having the server register with itself can make the 'renew' threshold act oddly in some cases.

Will try to re-enable just that option and see what happens.

spencergibb commented 8 years ago

Also from https://github.com/Netflix/eureka/issues/840#issuecomment-245052062 (typo fixed)

Note that the Amazon based datacenter info refreshes in ApplicationInfoManager only occurs if the config is of CloudInstanceConfig.

Our config isn't a CloudInstanceConfig

spencergibb commented 8 years ago

@nick-pww those guides are for single instance eureka's, production should be a peered cluster, see #1251.

nick-pww commented 8 years ago

@spencergibb It's not really clear that those are 'development' only options that should be set. Would recommend that a large note or something goes in there stating such.

@qiangdavidliu + @spencergibb I've changed the config but still have the same issue with new instances. I'm still getting the:

2016-09-06 19:44:15.541  INFO 25015 --- [Eureka-EIPBinder] c.n.e.r.PeerAwareInstanceRegistryImpl    : Renew threshold is: 1
2016-09-06 19:44:15.541  INFO 25015 --- [Eureka-EIPBinder] c.n.e.r.PeerAwareInstanceRegistryImpl    : Priming AWS connections for all replicas..

messages, and it's still resetting every minute. Both servers are registering with each other and show up in the list of applications, but the one where I cleared the EIP and restarted is exhibiting this still, while the one that I didn't seems to be working as expected.

(new config edit)

eureka:
    client:
        registerWithEureka: true
        fetchRegistry: false
florind commented 8 years ago

I am actually struggling with the exact same issue. Explicitly setting hostname and IP address in the EurekaInstanceConfigBean @Bean is also not working:

        eurekaInstanceConfig.setIpAddress(info.get(AmazonInfo.MetaDataKey.publicIpv4));
        eurekaInstanceConfig.setHostname(info.get(AmazonInfo.MetaDataKey.publicHostname));

as this bean seems to be initialized before EIPManager binds an EIP address and so both values are null. The lame hack so far is that I listen to EurekaRegistryAvailableEvent and restart the application if EurekaInstanceConfigBean.getHostname() is null as the second time around the EIP is already bound to the aws instance and it all works...

qiangdavidliu commented 8 years ago

@spencergibb at Netflix we use the CloudInstanceConfig that has the ability to refresh the underlying AmazonInfo. Does the spring cloud configs do similar?

spencergibb commented 8 years ago

@qiangdavidliu no it doesn't :-(

spencergibb commented 8 years ago

It extends PropertiesInstanceConfig and we use boot @ConfigurationProperties to load properties so we needed a different class, but since it implemented an interface EurekaInstanceConfig when we started it was ok. I wonder if we could break the business logic out into a separate class that get's injected so we could reuse it? We can always copy/paste.

qiangdavidliu commented 8 years ago

Let me see what I can do on that.

spencergibb commented 8 years ago

thanks!

herder commented 8 years ago

This works for us:

@Configuration
@Slf4j
@ConditionalOnAwsCloudEnvironment
@EnableContextInstanceData
@Import(UtilAutoConfiguration.class)
@AutoConfigureAfter(UtilAutoConfiguration.class)
public class AwsInstanceConfig {

    @Value("${server.port:${SERVER_PORT:${PORT:8080}}}")
    int nonSecurePort;

    @Value("${management.port:${MANAGEMENT_PORT:${server.port:${SERVER_PORT:${PORT:8080}}}}}")
    int managementPort;

    @Value("${eureka.instance.hostname:${EUREKA_INSTANCE_HOSTNAME:}}")
    String hostname;

    @Autowired
    ConfigurableEnvironment env;

    @Bean
    public EurekaInstanceConfigBean eurekaInstanceConfigBean(InetUtils utils) {
        log.info("Setting AmazonInfo on EurekaInstanceConfigBean");
        final EurekaInstanceConfigBean instance = new EurekaInstanceConfigBean(utils) {

            @Scheduled(initialDelay = 30000L, fixedRate = 30000L)
            public void refreshInfo() {
                log.debug("Checking datacenter info changes");
                AmazonInfo newInfo = AmazonInfo.Builder.newBuilder().autoBuild("eureka");
                if (!this.getDataCenterInfo().equals(newInfo)) {
                    log.info("Updating datacenterInfo to {}", newInfo);
                    ((AmazonInfo) this.getDataCenterInfo()).setMetadata(newInfo.getMetadata());
                }
            }

            private AmazonInfo getAmazonInfo() {
                return (AmazonInfo) getDataCenterInfo();
            }

            @Override
            public String getHostname() {
                AmazonInfo info = getAmazonInfo();
                final String publicHostname = info.get(AmazonInfo.MetaDataKey.publicHostname);
                return this.isPreferIpAddress() ?
                    info.get(AmazonInfo.MetaDataKey.localIpv4) :
                    publicHostname == null ?
                        info.get(AmazonInfo.MetaDataKey.localHostname) : publicHostname;
            }

            @Override
            public String getHostName(final boolean refresh) {
                return getHostname();
            }

            @Override
            public String getHomePageUrl() {
                return super.getHomePageUrl();
            }

            @Override
            public String getStatusPageUrl() {
                String scheme = getSecurePortEnabled() ? "https" : "http";
                return scheme + "://" + getHostname() + ":"
                    + managementPort + getStatusPageUrlPath();
            }

            @Override
            public String getHealthCheckUrl() {
                String scheme = getSecurePortEnabled() ? "https" : "http";
                return scheme + "://" + getHostname() + ":"
                    + managementPort + getHealthCheckUrlPath();
            }
        };
        AmazonInfo info = AmazonInfo.Builder.newBuilder().autoBuild("eureka");
        log.info("Info: {}", info);
        instance.setDataCenterInfo(info);
        instance.setNonSecurePort(this.nonSecurePort);
        instance.setInstanceId(getDefaultInstanceId(this.env));
        if (this.managementPort != this.nonSecurePort && this.managementPort != 0) {
            if (StringUtils.hasText(this.hostname)) {
                instance.setHostname(this.hostname);
            }
        }

        return instance;
    }

}

I.e. we do a scheduled check on whether the datacenterinfo has been updated, and reset it in that case. I'm sure there's room for cleanup here, but maybe it's a start?

spencergibb commented 8 years ago

@herder Netflix devs have moved the functionality to a shared class that we will be able to leverage. https://github.com/Netflix/eureka/pull/843

spencergibb commented 8 years ago

This depends on #1345

elnur commented 8 years ago

Can't wait to get this released.

DickChesterwood commented 7 years ago

Many thanks to @herder for the suggested auto-refresh hack; working great for me.

I can't quite work out when the Eureka 1.6 upgrade will appear, will it be in the Dalston release train?

It's far too long to read but I've documented my experiments here - let me know if I've made any blunders

Edit to add that the OP noticed that not doing this refresh causes the registry to be wiped; I had the opposite experience that instances never get expired (it's not self preservation!). I can't think how that could be the case, so I'd be interested if anyone has any insight.

spencergibb commented 7 years ago

thanks @DickChesterwood. 1.6 is part of Dalston. See spring-cloud-release/milestones

DickChesterwood commented 7 years ago

Lovely thanks Spencer!

gadamsciv commented 6 years ago

@spencergibb Is this still an issue? I'm experiencing the same issue using Edgware.RELEASE. Is the scheduled task workaround still necessary?

spencergibb commented 6 years ago

@gadamsciv it is still open, so yes.

harmoney-ryanli commented 2 years ago

FYI, I came across this question as well, and I tried to add the scheduled task to refresh instance info. But the task doesn't start. At last, found out that if the scheduled task is in a configuration class, need to add the annotation EnableScheduling to run the task.