Open Joe-Heffer-Shef opened 2 years ago
Can you poke around with a debugger?
ValueError: invalid literal for int() with base 10
I don't see int() in the stack trace anywhere... I wonder what's actually raising that exception.
It looks like it's something to do with how len
works.
For example:
>>> import os
>>> data = os.urandom(32)
>>> data
b"\xaf\xc6\x89\xc4xt2s'_\xc5\xd3\xb1\xe9\x86\xa5&\x80\xf2!\x96q\xff\xbc\x81?\xc4\x8e\x14q\xe9E"
>>> len(data)
32
I don't see any int()
in your example either – how do you mean?
I think the Python built-in function len
is created using CPython so the source code isn't available.
https://docs.python.org/3/library/functions.html#len
def len(*args, **kwargs): # real signature unknown
""" Return the number of items in a container. """
pass
This means we won't see int
in the stack trace.
It guess when calling len(s)
it tries to cast the size of the argument s
to an integer. For some reason this part of the code gives a binary data value for the size of the data variable?
File "C:\Users\my_username\Miniconda3\envs\my_project\lib\site-packages\smart_open\azure.py", line 322, in readinto
b[:len(data)] = data
For what possible values of b
and data
will b[:len(data)] = data
(or parts of it) raise that exception?
If you're able to dig in with a debugger, it would be good to know what those values are.
I believe this is an issue under the hood with the readinto
implementation. I run into this same error when using S3 and Linux. The problem seems to be assigning a binary string into a numpy array. Perhaps the exception that the next line catches should be ValueError instead of AttributeError?
For what possible values of
b
anddata
willb[:len(data)] = data
(or parts of it) raise that exception?If you're able to dig in with a debugger, it would be good to know what those values are.
I ran the script using the PyCharm debugger.
Here are the values of the variables when the exception occurs:
# type: numpy.ndarray
b = [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0, 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0, 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# type: bytes
data = b'\x00\x00\x00\x00\x00\xf05\xbf\x00\x00\x00\x00.... (lots of binary data)
This is the traceback:
Traceback (most recent call last):
File "C:/Users/my_username/my_project/scripts/blob-tdms/smart.py", line 45, in main
for chunk in channel.data_chunks():
File "C:\Anaconda\envs\my_project\lib\site-packages\nptdms\tdms.py", line 586, in data_chunks
for raw_data_chunk in self._read_channel_data_chunks():
File "C:\Anaconda\envs\my_project\lib\site-packages\nptdms\tdms.py", line 780, in _read_channel_data_chunks
for chunk in self._reader.read_raw_data_for_channel(self.path):
File "C:\Anaconda\envs\my_project\lib\site-packages\nptdms\reader.py", line 218, in read_raw_data_for_channel
for i, chunk in enumerate(
File "C:\Anaconda\envs\my_project\lib\site-packages\nptdms\tdms_segment.py", line 269, in read_raw_data_for_channel
for chunk in self._read_channel_data_chunks(f, data_objects, channel_path, chunk_offset, stop_chunk):
File "C:\Anaconda\envs\my_project\lib\site-packages\nptdms\tdms_segment.py", line 367, in _read_channel_data_chunks
for chunk in reader.read_channel_data_chunks(file, data_objects, channel_path, chunk_offset, stop_chunk):
File "C:\Anaconda\envs\my_project\lib\site-packages\nptdms\base_segment.py", line 64, in read_channel_data_chunks
yield self._read_channel_data_chunk(file, data_objects, chunk_index, channel_path)
File "C:\Anaconda\envs\my_project\lib\site-packages\nptdms\tdms_segment.py", line 492, in _read_channel_data_chunk
channel_data = RawChannelDataChunk.channel_data(obj.read_values(file, number_values, self.endianness))
File "C:\Anaconda\envs\my_project\lib\site-packages\nptdms\tdms_segment.py", line 557, in read_values
return fromfile(file, dtype=dtype, count=number_values)
File "C:\Anaconda\envs\my_project\lib\site-packages\nptdms\base_segment.py", line 147, in fromfile
bytes_read = file.readinto(buffer[offset:])
File "C:\Anaconda\envs\my_project\lib\site-packages\smart_open\azure.py", line 322, in readinto
b[:len(data)] = data
ValueError: invalid literal for int() with base 10: b"\x00\x00\x00\x00\x00\xf05\xbf\x00\x00\x00\x00\x00\xa0=\xbf\x00\x00\x00\x00\x00P<\xbf\x00\x00\x00\x00\x00\xd0G\xbf\x00\x00\x00\x00\x00\xd0M\xbf\x00\x00\x00\x00\x00PL\xbf\x00\x00\x00\x00\x00\x98F\xbf\
This is the code in azure.py
where the crash happens:
def readinto(self, b):
"""Read up to len(b) bytes into b, and return the number of bytes read."""
data = self.read(len(b))
if not data:
return 0
b[:len(data)] = data
return len(data)
Please note I've updated the package versions like so: (Conda environment.yaml
file)
name: my_env
channels:
- conda-forge
- defaults
dependencies:
- ca-certificates=2022.9.14=h5b45459_0
- certifi=2022.9.14=pyhd8ed1ab_0
- nptdms=1.6.0=pyhd8ed1ab_0
- smart-open=6.2.0=pyh1a96a4e_0
- smart_open=6.2.0=pyha770c72_0
Problem description
I am trying to stream a binary file from Azure Blob Storage.
I expect to be able to iterate over chunks of the data set, but I see an error do with the Azure
readinto
function.I'm using the npTDMS library to read a LabVIEW data file in TDMS format (binary quantitative data files.)
Steps/code to reproduce the problem
The code is something like this:
and the error I get is:
It seems like it's expecting a text file? Or it's not calculating the data index correctly to page through the data set?
Versions
From
pip list
: