lhotse-speech / lhotse

Tools for handling speech data in machine learning projects.
https://lhotse.readthedocs.io/en/latest/
Apache License 2.0
904 stars 204 forks source link

Download AMI (SDM): HTTP Error 404: Not Found #1282

Closed AntoineBlanot closed 2 months ago

AntoineBlanot commented 5 months ago

When downloading the AMI corpus with the SDM (single distance microphone), I get a 404 Error.

To reproduce:

from lhotse.recipes import download_ami, prepare_ami
ami_path = download_ami(mic='sdm')
AntoineBlanot commented 5 months ago

May be related to this issue: https://github.com/lhotse-speech/lhotse/issues/1177

desh2608 commented 5 months ago

Is this on the latest Lhotse? Also, can you show the full error?

AntoineBlanot commented 5 months ago

@desh2608 I am running on latest Lhotse, version 1.20.0

Here is the full error:

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
Cell In[6], [line 1](vscode-notebook-cell:?execution_count=6&line=1)
----> [1](vscode-notebook-cell:?execution_count=6&line=1) ami_path = download_ami(
      [2](vscode-notebook-cell:?execution_count=6&line=2)     target_dir=DATA_DIR,
      [3](vscode-notebook-cell:?execution_count=6&line=3)     annotations=DATA_DIR [/](https://file+.vscode-resource.vscode-cdn.net/) 'ami_public_manual_1.6.2.zip',
      [4](vscode-notebook-cell:?execution_count=6&line=4)     mic='sdm',
      [5](vscode-notebook-cell:?execution_count=6&line=5) )

File [~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/recipes/ami.py:265](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/recipes/ami.py:265), in download_ami(target_dir, annotations, force_download, url, mic)
    [260](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/recipes/ami.py:260) annotations = (
    [261](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/recipes/ami.py:261)     target_dir [/](https://file+.vscode-resource.vscode-cdn.net/) "ami_public_manual_1.6.2.zip" if not annotations else annotations
    [262](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/recipes/ami.py:262) )
    [264](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/recipes/ami.py:264) # Audio
--> [265](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/recipes/ami.py:265) download_audio(target_dir, force_download, url, mic)
    [267](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/recipes/ami.py:267) # Annotations
    [268](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/recipes/ami.py:268) logging.info("Downloading AMI annotations")

File [~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/recipes/ami.py:200](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/recipes/ami.py:200), in download_audio(target_dir, force_download, url, mic)
    [198](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/recipes/ami.py:198)     wav_dir.mkdir(parents=True, exist_ok=True)
    [199](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/recipes/ami.py:199)     wav_path = wav_dir [/](https://file+.vscode-resource.vscode-cdn.net/) wav_name
--> [200](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/recipes/ami.py:200)     resumable_download(
    [201](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/recipes/ami.py:201)         wav_url, filename=wav_path, force_download=force_download
    [202](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/recipes/ami.py:202)     )
    [203](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/recipes/ami.py:203) elif mic == "mdm":
    [204](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/recipes/ami.py:204)     for array in MDM_ARRAYS:

File [~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/utils.py:543](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/utils.py:543), in resumable_download(url, filename, force_download, completed_file_size)
    [541](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/utils.py:541)         _download(urllib.request.Request(url, headers=ua_headers), 0)
    [542](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/utils.py:542) else:
--> [543](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/utils.py:543)     raise e

File [~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/utils.py:517](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/utils.py:517), in resumable_download(url, filename, force_download, completed_file_size)
    [514](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/utils.py:514)                 pbar.update(len(chunk))
    [516](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/utils.py:516) try:
--> [517](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/utils.py:517)     _download(req, file_size)
    [518](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/utils.py:518) except urllib.error.HTTPError as e:
    [519](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/utils.py:519)     # "Request Range Not Satisfiable" means the requested range
    [520](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/utils.py:520)     # starts after the file ends OR that the server does not support range requests.
    [521](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/utils.py:521)     if e.code == 416:

File [~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/utils.py:499](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/utils.py:499), in resumable_download.<locals>._download(rq, size)
    [496](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/utils.py:496) f.truncate()
    [498](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/utils.py:498) # Open the URL and read the contents in chunks
--> [499](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/utils.py:499) with urllib.request.urlopen(rq) as response:
    [500](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/utils.py:500)     chunk_size = 1024
    [501](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/.cache/pypoetry/virtualenvs/diarization-eval-xgb8_ljZ-py3.10/lib/python3.10/site-packages/lhotse/utils.py:501)     total_size = int(response.headers.get("content-length", 0)) + size

File [~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:216](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:216), in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    [214](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:214) else:
    [215](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:215)     opener = _opener
--> [216](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:216) return opener.open(url, data, timeout)

File [~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:525](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:525), in OpenerDirector.open(self, fullurl, data, timeout)
    [523](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:523) for processor in self.process_response.get(protocol, []):
    [524](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:524)     meth = getattr(processor, meth_name)
--> [525](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:525)     response = meth(req, response)
    [527](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:527) return response

File [~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:634](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:634), in HTTPErrorProcessor.http_response(self, request, response)
    [631](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:631) # According to RFC 2616, "2xx" code indicates that the client's
    [632](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:632) # request was successfully received, understood, and accepted.
    [633](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:633) if not (200 <= code < 300):
--> [634](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:634)     response = self.parent.error(
    [635](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:635)         'http', request, response, code, msg, hdrs)
    [637](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:637) return response

File [~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:557](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:557), in OpenerDirector.error(self, proto, *args)
    [555](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:555)     http_err = 0
    [556](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:556) args = (dict, proto, meth_name) + args
--> [557](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:557) result = self._call_chain(*args)
    [558](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:558) if result:
    [559](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:559)     return result

File [~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:496](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:496), in OpenerDirector._call_chain(self, chain, kind, meth_name, *args)
    [494](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:494) for handler in handlers:
    [495](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:495)     func = getattr(handler, meth_name)
--> [496](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:496)     result = func(*args)
    [497](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:497)     if result is not None:
    [498](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:498)         return result

File [~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:749](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:749), in HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)
    [746](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:746) fp.read()
    [747](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:747) fp.close()
--> [749](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:749) return self.parent.open(new, timeout=req.timeout)

File [~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:525](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:525), in OpenerDirector.open(self, fullurl, data, timeout)
    [523](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:523) for processor in self.process_response.get(protocol, []):
    [524](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:524)     meth = getattr(processor, meth_name)
--> [525](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:525)     response = meth(req, response)
    [527](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:527) return response

File [~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:634](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:634), in HTTPErrorProcessor.http_response(self, request, response)
    [631](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:631) # According to RFC 2616, "2xx" code indicates that the client's
    [632](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:632) # request was successfully received, understood, and accepted.
    [633](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:633) if not (200 <= code < 300):
--> [634](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:634)     response = self.parent.error(
    [635](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:635)         'http', request, response, code, msg, hdrs)
    [637](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:637) return response

File [~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:563](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:563), in OpenerDirector.error(self, proto, *args)
    [561](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:561) if http_err:
    [562](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:562)     args = (dict, 'default', 'http_error_default') + orig_args
--> [563](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:563)     return self._call_chain(*args)

File [~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:496](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:496), in OpenerDirector._call_chain(self, chain, kind, meth_name, *args)
    [494](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:494) for handler in handlers:
    [495](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:495)     func = getattr(handler, meth_name)
--> [496](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:496)     result = func(*args)
    [497](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:497)     if result is not None:
    [498](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:498)         return result

File [~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:643](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:643), in HTTPDefaultErrorHandler.http_error_default(self, req, fp, code, msg, hdrs)
    [642](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:642) def http_error_default(self, req, fp, code, msg, hdrs):
--> [643](https://file+.vscode-resource.vscode-cdn.net/home/maedachikara/AMT/diarization-eval/notebooks/~/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py:643)     raise HTTPError(req.full_url, code, msg, hdrs, fp)

HTTPError: HTTP Error 404: Not Found
danpovey commented 5 months ago

Is lhotse up to date? I seem to rememb er at some point the URL was changed.

AntoineBlanot commented 5 months ago

Is lhotse up to date? I seem to rememb er at some point the URL was changed.

I am using latest version, up-to-date yes.

desh2608 commented 5 months ago

Do you get the error immediately or on a particular file? Your code works fine for me (although I will have to wait a bit to get through all the files):

image

We download the corpus from this mirror: https://groups.inf.ed.ac.uk/ami//AMICorpusMirror/amicorpus/ Is it accessible on your network? You can try to download some file from there using wget to confirm.

AntoineBlanot commented 5 months ago

I have checked again and it seems like the problem occurs at iteration 105.

wav_db/IS1001c/audio/IS1001c.Array1-01.wav: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 46.3M/46.3M [00:25<00:00, 1.80MB/s]
wav_db/IS1001d/audio/IS1001d.Array1-01.wav: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25.0M/25.0M [00:16<00:00, 1.52MB/s]
wav_db/IS1002b/audio/IS1002b.Array1-01.wav: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 75.8M/75.8M [01:00<00:00, 1.26MB/s]
wav_db/IS1002c/audio/IS1002c.Array1-01.wav: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 66.6M/66.6M [00:28<00:00, 2.31MB/s]
wav_db/IS1002d/audio/IS1002d.Array1-01.wav: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40.4M/40.4M [00:39<00:00, 1.02MB/s]
wav_db/IS1003a/audio/IS1003a.Array1-01.wav: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29.2M/29.2M [00:21<00:00, 1.35MB/s]
Downloading AMI meetings: 105it [1:08:28, 39.13s/it]██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏| 29.0M/29.2M [00:21<00:00, 1.58MB/s]
Traceback (most recent call last):
  File "/home/c.maeda/workspaces/amt-eval/download.py", line 3, in <module>
    ami_path = download_ami(
  File "/home/c.maeda/.cache/pypoetry/virtualenvs/diarization-eval-RBL--BOT-py3.10/lib/python3.10/site-packages/lhotse/recipes/ami.py", line 265, in download_ami
    download_audio(target_dir, force_download, url, mic)
  File "/home/c.maeda/.cache/pypoetry/virtualenvs/diarization-eval-RBL--BOT-py3.10/lib/python3.10/site-packages/lhotse/recipes/ami.py", line 200, in download_audio
    resumable_download(
  File "/home/c.maeda/.cache/pypoetry/virtualenvs/diarization-eval-RBL--BOT-py3.10/lib/python3.10/site-packages/lhotse/utils.py", line 543, in resumable_download
    raise e
  File "/home/c.maeda/.cache/pypoetry/virtualenvs/diarization-eval-RBL--BOT-py3.10/lib/python3.10/site-packages/lhotse/utils.py", line 517, in resumable_download
    _download(req, file_size)
  File "/home/c.maeda/.cache/pypoetry/virtualenvs/diarization-eval-RBL--BOT-py3.10/lib/python3.10/site-packages/lhotse/utils.py", line 499, in _download
    with urllib.request.urlopen(rq) as response:
  File "/home/c.maeda/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
  File "/home/c.maeda/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py", line 525, in open
    response = meth(req, response)
  File "/home/c.maeda/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py", line 634, in http_response
    response = self.parent.error(
  File "/home/c.maeda/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py", line 557, in error
    result = self._call_chain(*args)
  File "/home/c.maeda/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/home/c.maeda/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py", line 749, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/home/c.maeda/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py", line 525, in open
    response = meth(req, response)
  File "/home/c.maeda/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py", line 634, in http_response
    response = self.parent.error(
  File "/home/c.maeda/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py", line 563, in error
    return self._call_chain(*args)
  File "/home/c.maeda/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/home/c.maeda/miniconda3/envs/py3.10/lib/python3.10/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
AntoineBlanot commented 5 months ago

More specifically, this file is not found, even with wget

HRI-JP1964:~ maedachikara$ wget https://groups.inf.ed.ac.uk/ami//AMICorpusMirror/amicorpus/IS1003a/audio/IS1003a.Array1-01.wav
--2024-02-08 11:51:32--  https://groups.inf.ed.ac.uk/ami//AMICorpusMirror/amicorpus/IS1003a/audio/IS1003a.Array1-01.wav
Resolving groups.inf.ed.ac.uk (groups.inf.ed.ac.uk)... 129.215.202.26
Connecting to groups.inf.ed.ac.uk (groups.inf.ed.ac.uk)|129.215.202.26|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 29238717 (28M) [audio/x-wav]
Saving to: ‘IS1003a.Array1-01.wav’

IS1003a.Array1-01.wav                               100%[===================================================================================================================>]  27.88M  1.54MB/s    in 23s

2024-02-08 11:51:57 (1.19 MB/s) - ‘IS1003a.Array1-01.wav’ saved [29238717/29238717]

HRI-JP1964:~ maedachikara$ wget https://groups.inf.ed.ac.uk/ami//AMICorpusMirror/amicorpus/IS1003a/audio/IS1003b.Array1-01.wav
--2024-02-08 11:52:02--  https://groups.inf.ed.ac.uk/ami//AMICorpusMirror/amicorpus/IS1003a/audio/IS1003b.Array1-01.wav
Resolving groups.inf.ed.ac.uk (groups.inf.ed.ac.uk)... 129.215.202.26
Connecting to groups.inf.ed.ac.uk (groups.inf.ed.ac.uk)|129.215.202.26|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2024-02-08 11:52:03 ERROR 404: Not Found.
desh2608 commented 5 months ago

If you have the latest Lhotse (including https://github.com/lhotse-speech/lhotse/pull/1178) then that error should not be happening.

AntoineBlanot commented 5 months ago

If by latest you mean latest release I am on it yes, like I said above, I am on version 1.20.0. On the link you sent me above, I cannot even do wget, so the issue seem to be unrelated to Lhotse anyway.

wget https://groups.inf.ed.ac.uk/ami//AMICorpusMirror/amicorpus/IS1003a/audio/IS1003b.Array1-01.wav

This is not working for you neither right ? It is using the link above

desh2608 commented 5 months ago

Can you check if your Lhotse has the changes from the PR linked above?

If any file is not present, we warn and continue so the program should not crash.

AntoineBlanot commented 5 months ago

Can you check if your Lhotse has the changes from the PR linked above?

If any file is not present, we warn and continue so the program should not crash.

I have the changes but they are only made for the mdm mic setting while I am getting the error for sdm mic setting, Seems like a similar fix has to be done for the sdm mic setting

dr-pato commented 3 months ago

Hi, I am facing the same problem. The program crashes at iteration 105. I am using the last release (v1.22). Could you tell us if this problem is going to be fixed? Thanks!

pzelasko commented 3 months ago

Please try #1318 and LMK if that helps.

dr-pato commented 2 months ago

Hi @pzelasko, thank you very much for the support. Now the program does not crash when a file does not exist. I will manually download the correct URLs for missing files.

pzelasko commented 2 months ago

could you find out which files these are and share their urls here? Would be great if we can fix the recipe properly

dr-pato commented 2 months ago

I found 2 missing files for the sdm mic setting. The files are as follows:

http://groups.inf.ed.ac.uk/ami/AMICorpusMirror/amicorpus/IS1003b/audio/IS1003b.Array1-01.wav https://groups.inf.ed.ac.uk/ami/AMICorpusMirror/amicorpus/IS1007d/audio/IS1007d.Array1-01.wav

I replaced them with:

https://groups.inf.ed.ac.uk/ami/AMICorpusMirror//amicorpus/IS1003b/audio/IS1003b.Array2-02.wav https://groups.inf.ed.ac.uk/ami/AMICorpusMirror//amicorpus/IS1007d/audio/IS1007d.Array2-02.wav

which seem to be the only audio signals available from microphone arrays.

desh2608 commented 2 months ago

Hmm, I think for the SDM setting we should fall back to this option of downloading an alternate channel other than Array1-01. But it should come with a user warning since most SDM results in literature assume you have used the first mic of array 1.