streamnative / kop

Kafka-on-Pulsar - A protocol handler that brings native Kafka protocol to Apache Pulsar
https://streamnative.io/docs/kop
Apache License 2.0
450 stars 136 forks source link

[BUG] 307 error on public/__kafka_producerid/__transaction_producerid_generator creation when pulsar standalone is restarted as a systemd service #1416

Open MMirelli opened 2 years ago

MMirelli commented 2 years ago

Describe the bug The broker keeps sending 307 error messages to

PUT /admin/v2/persistent/public/__kafka_producerid/__transaction_producerid_generator?authoritative=true HTTP/1.1

when pulsar standalone is restarted as a systemd service.

This seems to be caused by a race condition: probably the PUT above is issued even before the bundle containing that topic is assigned to the only standalone broker.

To Reproduce Steps to reproduce the behavior:

  1. vagrant init generic/rhel7 && vagrant up && vagrant ssh
  2. copy / paste the service to /etc/systemd/system/pulsar-standalone1.service:
[Unit]
Description=Pulsar standalone debug race condition

[Service]
WorkingDirectory=/home/vagrant/apache-pulsar-2.10.1
Type=simple
Environment=JAVA_HOME=/usr/local/jdk-11.0.1 PATH=/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/jdk-11.0.1/bin 
ExecStart=/bin/bash -c "./start_standalone.sh"
KillMode=mixed

[Install]
WantedBy=multi-user.target
  1. Add to $HOME/apache-pulsar-2.10.1/conf/standalone.conf
    messagingProtocols=kafka
    kafkaTransactionCoordinatorEnabled=true
    kafkaEnableMultiTenantMetadata=true
    kafkaNamespace=kafka
    kafkaListeners=SASL_PLAINTEXT://127.0.0.1:9092
    kafkaAdvertisedListeners=SASL_PLAINTEXT://127.0.0.1:9092
    kafkaManageSystemNamespaces=true
  2. sudo systemctl start pulsar-standalone1.service
  3. pulsar_admin --admin-url "http://localhost:8080" tenants create tenant1
  4. pulsar_admin --admin-url "http://localhost:8080" namespaces create tenant1/kafka
  5. ./kafka_2.12-3.2.0/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic topic1 --from-beginning --consumer.config kafka_client_kafkacluster1.properties 9092 # in console-0
  6. ./kafka_2.12-3.2.0/bin/kafka-console-producer.sh --bootstrap-server localhost:9092 --topic topic1 --producer.config kafka_client_kafkacluster1.properties 9092 # in console-1
  7. The messages produced in console-1 will appear in console-0.
  8. sudo systemctl restart pulsar-standalone1.service
  9. The service doesn't start up and the below appears:
Jul 20 20:14:22 rhel7.localdomain bash[11095]: 2022-07-20T20:14:22,380+0000 [pulsar-web-46-5] INFO  org.eclipse.jetty.server.RequestLo
g - 127.0.0.1 - - [20/Jul/2022:20:14:22 +0000] "PUT /admin/v2/persistent/public/__kafka_producerid/__transaction_producerid_generator 
HTTP/1.1" 307 0 "-" "Pulsar-Java-v2.10.1" 18
...
Jul 20 20:14:22 rhel7.localdomain bash[11095]: 2022-07-20T20:14:22,523+0000 [AsyncHttpClient-70-1] WARN  org.apache.pulsar.client.admin.internal.BaseResource - [http://localhost:8080/admin/v2/persistent/public/__kafka_producerid/__transaction_producerid_generator] Failed to perform http put request: java.util.concurrent.CompletionException: org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector$RetryException: Could not complete the operation. Number of retries has been exhausted. Failed reason: Maximum redirect reached: 5
Jul 20 20:14:22 rhel7.localdomain bash[11095]: 2022-07-20T20:14:22,523+0000 [main] ERROR io.streamnative.pulsar.handlers.kop.utils.MetadataUtils - Failed to successfully initialize Kafka Metadata public/__kafka_producerid
Jul 20 20:14:22 rhel7.localdomain bash[11095]: org.apache.pulsar.client.admin.PulsarAdminException: java.util.concurrent.CompletionException: org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector$RetryException: Could not complete the operation. Number of retries has been exhausted. Failed reason: Maximum redirect reached: 5

Expected behavior The standalone cluster should be able to get restarted, without the error above being displayed.

Additional context This was reproed on a RHEL VM started up on vagrant. Dependecies:

BewareMyPower commented 2 years ago

I think currently a workaround is to configure kafkaManageSystemNamespaces=false to skip the metadata creation. There are some issues that we have seen when the admin operation happened in the broker's start phase.

MMirelli commented 2 years ago

Yes, I agree. That is what I have done, in the end. But then I had to create all the system topics as shown here: https://github.com/datastax/pulsar-helm-chart/blob/master/examples/kafka/create-tenant-full.sh.

It would be nice to keep kafkaManageSystemNamespaces enabled either for unexperienced users or for those occasions when there is no time to create all the system namespaces.

BewareMyPower commented 2 years ago

I think in future the deployment of KoP (or other protocol handlers and connectors) should be like the Initialize cluster metadata step for Pulsar. It creates the public/default namespace just like KoP creates the public/__kafka namespace and other system topics.

kafkaManageSystemNamespaces should only be true for simple usage like standalone, but if you're deploying KoP in production environment, you must run a CLI tool like your create-tenant-full.sh once, then change the kafkaManageSystemNamespaces config of all brokers to false.