speechmatics / speechmatics-python

Python library and CLI for Speechmatics
https://speechmatics.github.io/speechmatics-python/
MIT License
58 stars 14 forks source link

Sandboxed (MacOS) Batch transcription fails due to permission error accessing '/private/etc/apache2/mime.types' #81

Open petiatil opened 9 months ago

petiatil commented 9 months ago

Batch transcription works using the same code tested in the python console.

When sandboxed, the Batch transcription process fails, as some underlying library tries to access "/private/etc/apache2/mime.types".

To prevent this permission error, the file needs to be accessed within the app environment (or an alternative option is needed to avoid the file, if possible).

Real-time transcription works in the sandboxed context.

I first thought to store a local copy of the mime.types file and track down where Speechmatics is accessing it (to reroute the library to access the local version), but it is elusive and I suspect there is a better solution.

If there isn't a straightforward solution using the Speechmatics Python method, I'll plan to test with a lower-abstraction approach in python.

Batch transcription test:

import speechmatics
from speechmatics.batch_client import BatchClient

ssl_context = ssl.create_default_context()
ssl_context.load_verify_locations(certifi.where())

conf = speechmatics.models.BatchTranscriptionConfig(
              language=LANGUAGE,
              output_local=englishLocale if LANGUAGE == "en" else None,
              operating_point=operatingPoint,
            )

          settings = speechmatics.models.ConnectionSettings(
            url="https://asr.api.speechmatics.com/v2",
            auth_token=speechmaticsAPIkey,
            ssl_context=ssl_context,
          )

          try:
            with BatchClient(settings) as client:
              job_id = client.submit_job(audio=audio_file, transcription_config=conf)
              transcript = client.wait_for_completion(job_id, transcription_format='json-v2')
nickgerig commented 9 months ago

Hi @petiatil

We did some digging and it seems like the httpx lib imports mimetypes here:

https://github.com/encode/httpx/blob/2318fd822cdb16435ccb5cabcba16c0b7969c1e4/httpx/_utils.py#L4

So maybe this is the issue you're seeing:

https://github.com/python/cpython/blob/3.12/Lib/mimetypes.py#L48

We do have an open issue to replace httpx but it is unlikely to be done soon.

Hopefully that helps a little.

petiatil commented 9 months ago

Fortunately, using requests directly resolved the sandbox issue.

Thank you