tensorflow / io

Dataset, streaming, and file system extensions maintained by TensorFlow SIG-IO
Apache License 2.0
705 stars 285 forks source link

`tf.io.MongoDBIODataset` prints password in plaintext when connecting to server #1653

Open joshhansen opened 2 years ago

joshhansen commented 2 years ago

When connecting to a MongoDB database using MongoDBIODataset, the password used is twice printed in plaintext in an insecure fashion. For example, if the username is admin and the password is abc123, connecting to server example.com would lead to output like this:

2022-03-28 23:27:36.991099: I tensorflow_io/core/kernels/mongodb_kernels.cc:43] Connecting to: mongodb://admin:abc123@example.com
Connection successful: mongodb://username:abc123@example.com

As you can see, abc123 is revealed in both of these lines. This is exactly what happens in practice, as I see my own password printed out clearly when running this code (with dummy values substituted for privacy):

URI = "mongodb://admin:abc123@example.com"
DATABASE = "db"
COLLECTION = "col"

data = tfio.experimental.mongodb.MongoDBIODataset(
    uri=URI, database=DATABASE, collection=COLLECTION
)

Other Mongo tools redact the password in logging output, and it seems appropriate for this tool to do so as well.

chuck-confluent commented 2 years ago

Same with Kafka. From my logs:

2022-07-13 12:39:04.388267: I tensorflow_io/core/kernels/kafka_kernels.cc:879] Kafka configuration: sasl.password=<redacted>

Perhaps I'll open a new issue