python / cpython

The Python programming language
https://www.python.org/
Other
60.06k stars 29.08k forks source link

mimetypes.guess_type returns None for "somefile.txt" in only Azure durable function #118440

Closed tkumpumak closed 2 weeks ago

tkumpumak commented 2 weeks ago

Bug report

Bug description:

mimetypes.guess_type returns correctly 'text/plain' when testing on windows or docker linux container. Unfortunately on Azure durable function python code it for some reason returns None.

Following test code, that also contains parts of mimetype.guess_type was run on Windows and on Azure durable function. Result print follows below.

So I'm out of ideas how to debug this more. First of all what would be the right place to report this issue? What code Azure durable function could be running, it seems that it's not copy of the mimetype, but the data inside mimetype seems to be correct.

  doc_name = "file://test.txt"
  mime_types = mimetypes.MimeTypes()
  mime_type, encoding = mime_types.guess_type(doc_name)

  if mime_type is not None:
      logger.error("Test1: doc_name:" + doc_name + " mimetype: " + mime_type)
  else:
      logger.error("Test1: doc_name:" + doc_name + " mimetype: None")

  mime, _ = mimetypes.guess_type(doc_name, False)
  from mimetypes import _db #initialized on mimetypes.guess_type

  if mime is not None:
      logger.error("Test2: doc_name:" + doc_name + " mimetype: " + mime)
  else:
      logger.error("Test2: doc_name:" + doc_name + " mimetype: None")

  logger.error("Available types: " + str(_db.types_map[True]))

  import posixpath
  import urllib

  url = os.fspath(doc_name)
  scheme, url = urllib.parse._splittype(url)
  # logger.error("url:" + url + " scheme: " + scheme) #these are ok
  strict = False
  base, ext = posixpath.splitext(url)
  while (ext_lower := ext.lower()) in _db.suffix_map:
      base, ext = posixpath.splitext(base + _db.suffix_map[ext_lower])
  # encodings_map is case sensitive
  logger.error("ext:" + ext)
  if ext in _db.encodings_map:
      encoding = _db.encodings_map[ext]
      base, ext = posixpath.splitext(base)
  else:
      encoding = None
  ext = ext.lower()
  logger.error("ext:" + ext)
  types_map = _db.types_map[True]
  if ext in types_map:
      logger.error("Return 1, ext:" + ext + " mimetype: " + types_map[ext])
      return types_map[ext]
  elif strict:
      logger.error("Return 2, ext:" + ext + " mimetype: None")
      return None, encoding
  types_map = _db.types_map[False]
  if ext in types_map:
      logger.error("Return 3, ext:" + ext + " mimetype: " + types_map[ext])
      return types_map[ext]
  else:
      logger.error("Return 3, ext:" + ext + " mimetype: None")
      return None

Windows: Test1: doc_name:file://test.txt mimetype: text/plain Test2: doc_name:file://test.txt mimetype: text/plain Available types: SNIP long list, List contains: '.n3': 'text/n3', '.txt': 'text/plain', '.bat': 'text/plain', ext:.txt ext:.txt Return 1, ext:.txt mimetype: text/plain

Azure durable function: Test1: doc_name:file://test.txt mimetype: None Test2: doc_name:file://test.txt mimetype: None Available types: SNIP long list, List contains: '.n3: text/n3, .txt: text/plain, .bat: application/x-msdos-program', ext:.txt ext:.txt Return 1, ext:.txt mimetype: text/plain

CPython versions tested on:

3.11

Operating systems tested on:

Other

sobolevn commented 2 weeks ago

For others, who also don't know what Azure durable functions is:

Durable Functions is a feature of Azure Functions that lets you write stateful functions in a serverless compute environment.

Sorry, we need a reproducer to fix this :(

tkumpumak commented 2 weeks ago

Tested this a bit more. Copied whole mimetypes.py module to the image and it works correctly so no issue in the code, something wrong with the image. Reported again here https://github.com/Azure/azure-functions-docker/issues/1075