Closed hinas-source closed 3 months ago
Hi @hinas-source,
It looks like you're fetching from HTTP, but the server is returning a redirect to HTTPS, and httpx does not follow redirects by default. So you have 2 choices.
You could fetch from HTTPS in the first place, as in this:
import httpx
from stream_read_xbrl import stream_read_xbrl_zip
if __name__ == '__main__':
url = 'https://download.companieshouse.gov.uk/Accounts_Bulk_Data-2024-03-26.zip'
with httpx.stream('GET', url) as r:
r.raise_for_status()
with stream_read_xbrl_zip(r.iter_bytes(chunk_size=65536)) as (columns, rows):
for row in rows:
print(row)
Or, you can configure httpx to follow the redirect, as in this:
import httpx
from stream_read_xbrl import stream_read_xbrl_zip
if __name__ == '__main__':
url = 'http://download.companieshouse.gov.uk/Accounts_Bulk_Data-2024-03-26.zip'
with httpx.stream('GET', url, follow_redirects=True) as r:
r.raise_for_status()
with stream_read_xbrl_zip(r.iter_bytes(chunk_size=65536)) as (columns, rows):
for row in rows:
print(row)
My recommendation is the first of these - better to always just use HTTPS if you can from a security point of view, and avoids the redirect, so a (very small) time saving.
(Neither of these use the exact same URL as in your question, as I think it no longer exists - it returns a 404)
Thank you for you help @michalc
No problem!
`--------------------------------------------------------------------------- UnexpectedSignatureError Traceback (most recent call last) Cell In[5], line 9 5 url = f"https://download.companieshouse.gov.uk/Accounts_Bulk_Data-2024-01-20.zip" 6 with \ 7 httpx.stream('GET', url) as r, \ 8 stream_read_xbrl_zip(r.iter_bytes(chunk_size=65536)) as (columns, rows): ----> 9 df = pd.DataFrame(rows, columns=columns) 10 if isinstance(df, pd.DataFrame): 11 df1 = df
File c:\Users\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\frame.py:832, in DataFrame.init(self, data, index, columns, dtype, copy) 830 data = np.asarray(data) 831 else: --> 832 data = list(data) 833 if len(data) > 0: 834 if is_dataclass(data[0]):
File c:\Users\AppData\Local\Programs\Python\Python312\Lib\site-packages\stream_read_xbrl.py:556, in
File c:\Users\AppData\Local\Programs\Python\Python312\Lib\site-packages\stream_read_xbrl.py:546, in stream_read_xbrl_zip.
File c:\Users\AppData\Local\Programs\Python\Python312\Lib\site-packages\stream_read_xbrl.py:558, in
File c:\Users\AppData\Local\Programs\Python\Python312\Lib\site-packages\stream_unzip.py:460, in stream_unzip(zipfile_chunks, password, chunk_size, allow_zip64) 457 else: 458 raise UnexpectedSignatureError(signature) --> 460 for file_name, file_size, unzipped_chunks in all(): 461 yield file_name, file_size, unzippedchunks 462 for in unzipped_chunks:
File c:\Users\AppData\Local\Programs\Python\Python312\Lib\site-packages\stream_unzip.py:458, in stream_unzip.
UnexpectedSignatureError: b'<htm'`
I am getting this error
@hinas-source This seems like a different issue - can you raise a new issue at https://github.com/uktrade/stream-read-xbrl/issues?
When I run the below code:
import httpx from stream_read_xbrl import stream_read_xbrl_zip
A URL taken from http://download.companieshouse.gov.uk/en_accountsdata.html
if name == 'main': url = 'http://download.companieshouse.gov.uk/Accounts_Bulk_Data-2023-03-02.zip' with \ httpx.stream('GET', url) as r, \ stream_read_xbrl_zip(r.iter_bytes(chunk_size=65536)) as (columns, rows): r.raise_for_status() for row in rows: print(row)
I am getting the below error
HTTPStatusError: Redirect response '301 Moved Permanently' for url 'http://download.companieshouse.gov.uk/Accounts_Bulk_Data-2023-03-02.zip' Redirect location: 'https://download.companieshouse.gov.uk/Accounts_Bulk_Data-2023-03-02.zip' For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/301