Open Krasner opened 1 year ago
As expected the problem is with tensorflow_io not being used. I propose a few solutions:
Imports
In backend/event_processing/io_wrapper.py
:
import tensorflow as tf
import tensorflow_io as tfio
import s3fs
Note the import of s3fs
- this is because tf.io.gfile.glob
is VERY slow for recursing through an aws s3 path.
Walk through s3 path:
def S3ListRecursivelyViaWalking(top):
s3 = s3fs.S3FileSystem()
for dir_path, _, filenames in s3.walk(top, topdown=True, refresh=True):
yield (
"s3://" + dir_path,
(os.path.join("s3://" + dir_path, filename) for filename in filenames),
)
Use above method to index s3 path:
if io_util.IsCloudPath(path):
# Glob-ing for files can be significantly faster than recursively
# walking through directories for some file systems.
logger.info(
"GetLogdirSubdirectories: Starting to list directories via glob-ing."
)
if io_util.IsS3Path(path):
traversal_method = S3ListRecursivelyViaWalking
else:
traversal_method = ListRecursivelyViaGlobbing
Add io_util.IsS3Path
function in util/io_util.py
:
def IsS3Path(path):
return path.startswith("s3://")
Thoughts?
Hi @Krasner,
We added S3 support in https://github.com/tensorflow/tensorboard/pull/5491 (since TensorBoard v2.6). If the S3 directory parsing failed due to tensorflow-io
not found, the error message would be something like Error: Unsupported filename scheme S3...
(e.g. https://github.com/tensorflow/tensorboard/issues/5480), and it will prompt you to install TF I/O. I can see that TF I/O dependency exists in your environment from the diagnostics output, so I'm not sure if this is an issue with identifying and parsing S3 files.
The error messages Error message: No response body
and If the signature check failed. This could be because of a time skew. Attempting to adjust the signer
look like permission or configuration issue related to S3. I'm not familiar with AWS, is it possible to adjust the AWS_LOG_LEVEL
(or maybe there is another arg) to get more information about the failure?
@yatbear I don't think it's a permission issue - as I noted above, I can access aws s3
from my ec2 instance, and if I import tensorflow_io
in my script then I am also able to access aws files with tf.io.gfile
. However without the explicit import of this library tf.io.gfile
will fail.
Interestingly, after the fixes above the error messages are still visible with AWS_LOG_LEVEL=1
but tensorboard is able to access event files on s3.
Additionally, as I mentioned tf.io.gfile
is very slow compared to s3fs
for accessing s3 files.
@Krasner, thanks for the clarification and the proposed solutions above! I just saw this open issue under tensorflow-io repo: https://github.com/tensorflow/io/issues/1731, which suggests the problem lies here. A temporary workaround mentioned in https://github.com/tensorflow/io/issues/1731#issuecomment-1332779337 is to pin tensorflow-io
dependency to 0.27.0
, could you try this? In the meantime, I will do a bit more investigation before adding the new dependency s3fs
.
I saw this recent fix related S3: https://github.com/tensorflow/io/pull/1790, but it is not included to the latest tensorflow-io pip version: https://pypi.org/project/tensorflow-io/#history, and their nightly is also stale, left a comment under the aforementioned PR.
I am get the same error when using tensorboard --logdir s3://zenml-minio-store/logs/... I used version as below; tensorflow=2.8.0, tensorboard=2.8.0, tensorflow-io=0.24.0 I have tried to update to tensorflow=2.12.0, tensorboard=2.12.3, tensorflow-io= 0.33.0, but i doesn't work
I am using Tensorboard 2.9.1, when setting
--logdir
ass3://<bucket>/<folder>
tensorboard is not able to read event files.On my machine (EC2 instance) i am able to reach that logdir via aws cli (
aws s3 ls s3://<bucket>/<folder>
). In python I can also reach the files in that folder using tensorflow_io:This is the Tensorboard command:
This is the error code:
I would expect Tensorboard to use Tensorflow_IO's tensorflow_io/core/filesystems/s3/ but from the message above that does not seem to be happening. Notice in the diagnostics report I am using
tensorflow-io==0.26.0
andtensorflow-io-gcs-filesystem==0.26.0
Additionally I tried running tensorboard from a python script but get the same problem:
Environment information (required)
Diagnostics
Diagnostics output
`````` --- check: autoidentify INFO: diagnose_tensorboard.py version df7af2c6fc0e4c4a5b47aeae078bc7ad95777ffa --- check: general INFO: sys.version_info: sys.version_info(major=3, minor=9, micro=5, releaselevel='final', serial=0) INFO: os.name: posix INFO: os.uname(): posix.uname_result(sysname='Linux', nodename='ip-xxx-xx-xx-xxx', release='5.15.0-1026-aws', version='#30~20.04.2-Ubuntu SMP Fri Nov 25 14:53:22 UTC 2022', machine='x86_64') INFO: sys.getwindowsversion(): N/A --- check: package_management INFO: has conda-meta: False INFO: $VIRTUAL_ENV: None --- check: installed_packages INFO: installed: tensorboard==2.9.1 INFO: installed: tensorflow==2.9.2 INFO: installed: tensorflow-estimator==2.9.0 INFO: installed: tensorboard-data-server==0.6.1 --- check: tensorboard_python_version INFO: tensorboard.version.VERSION: '2.9.1' --- check: tensorflow_python_version INFO: tensorflow.__version__: '2.9.2' INFO: tensorflow.__git_version__: 'v2.9.1-132-g18960c44ad3' --- check: tensorboard_data_server_version INFO: data server binary: '/home/ubuntu/.local/lib/python3.9/site-packages/tensorboard_data_server/bin/server' INFO: data server binary version: b'rustboard 0.6.1' --- check: tensorboard_binary_path INFO: which tensorboard: b'/home/ubuntu/.local/bin/tensorboard\n' --- check: addrinfos socket.has_ipv6 = True socket.AF_UNSPEC =