ssl-hep / ServiceX_frontend

Client access library for ServiceX
5 stars 11 forks source link

Minio crashes, looses a data bucket, we need to restart #86

Closed gordonwatts closed 3 years ago

gordonwatts commented 4 years ago

This probably means clear the cache:

(.venv) PS C:\Users\gordo\Documents\Code\IRIS-HEP\ServiceX_frontend> python .\scripts\run_test.py
mc16_13TeV:mc16_13TeV.361106.PowhegPythia8EvtGen_AZNLOCTEQ6L1_Zee.deriv.DAOD_STDM3.e3601_e5984_s3126_r10201_r10210_p397Traceback (most recent call last):                                                                        | 0/82 [00:00]
  File ".\scripts\run_test.py", line 29, in <module>
    run_query(servicex_adaptor)
  File ".\scripts\run_test.py", line 18, in run_query
    r = ds.get_data_rootfiles("(call ResultTTree (call Select (call SelectMany (call EventDataset (list 'localds:bogus')) (lambda (list e) (call (attr e 'Jets') 'AntiKt4EMTopoJets'))) (lambda (list j) (/ (call (attr j 'pt')) 1000.0))) (list 'JetPt') 'analysis' 'junk.root')")  # NOQA
  File "C:\Users\gordo\Documents\Code\IRIS-HEP\ServiceX_frontend\.venv\lib\site-packages\make_it_sync\func_wrapper.py", line 57, in wrapped_call
    return _sync_version_of_function(v, *(args[1:]), **kwargs)
  File "C:\Users\gordo\Documents\Code\IRIS-HEP\ServiceX_frontend\.venv\lib\site-packages\make_it_sync\func_wrapper.py", line 14, in _sync_version_of_function
    return loop.run_until_complete(r)
  File "C:\Users\gordo\AppData\Local\Programs\Python\Python37\lib\asyncio\base_events.py", line 587, in run_until_complete
    return future.result()
  File "c:\users\gordo\documents\code\iris-hep\servicex_frontend\servicex\servicex_utils.py", line 49, in cached_version_of_fn
    result = await fn(*args, **kwargs)
  File "c:\users\gordo\documents\code\iris-hep\servicex_frontend\servicex\servicex.py", line 118, in get_data_rootfiles_async
    return await self._file_return(selection_query, 'root-file')
  File "c:\users\gordo\documents\code\iris-hep\servicex_frontend\servicex\servicex.py", line 161, in _file_return
    return await self._data_return(selection_query, convert_to_file, data_format)
  File "C:\Users\gordo\Documents\Code\IRIS-HEP\ServiceX_frontend\.venv\lib\site-packages\backoff\_async.py", line 133, in retry
    ret = await target(*args, **kwargs)
  File "c:\users\gordo\documents\code\iris-hep\servicex_frontend\servicex\servicex.py", line 200, in _data_return
    all_data = {f[0]: await f[1] async for f in as_data}
  File "c:\users\gordo\documents\code\iris-hep\servicex_frontend\servicex\servicex.py", line 200, in <dictcomp>
    all_data = {f[0]: await f[1] async for f in as_data}
  File "c:\users\gordo\documents\code\iris-hep\servicex_frontend\servicex\servicex.py", line 195, in <genexpr>
    as_data = ((f[0], asyncio.ensure_future(converter(await f[1])))
  File "c:\users\gordo\documents\code\iris-hep\servicex_frontend\servicex\servicex.py", line 191, in <genexpr>
    (f async for f in
  File "c:\users\gordo\documents\code\iris-hep\servicex_frontend\servicex\servicex.py", line 252, in _get_files
    async for r in stream_local_files:
  File "c:\users\gordo\documents\code\iris-hep\servicex_frontend\servicex\servicex.py", line 364, in _get_files_from_servicex
    async for info in stream_downloaded:
  File "c:\users\gordo\documents\code\iris-hep\servicex_frontend\servicex\servicex.py", line 330, in _download_a_file
    async for f in stream:
  File "c:\users\gordo\documents\code\iris-hep\servicex_frontend\servicex\minio_adaptor.py", line 191, in find_new_bucket_files
    files = adaptor.get_files(request_id)
  File "C:\Users\gordo\Documents\Code\IRIS-HEP\ServiceX_frontend\.venv\lib\site-packages\backoff\_sync.py", line 94, in retry
    ret = target(*args, **kwargs)
  File "c:\users\gordo\documents\code\iris-hep\servicex_frontend\servicex\minio_adaptor.py", line 61, in get_files
    return [f.object_name for f in self._client.list_objects(request_id)]
  File "c:\users\gordo\documents\code\iris-hep\servicex_frontend\servicex\minio_adaptor.py", line 61, in <listcomp>
    return [f.object_name for f in self._client.list_objects(request_id)]
  File "C:\Users\gordo\Documents\Code\IRIS-HEP\ServiceX_frontend\.venv\lib\site-packages\minio\api.py", line 1036, in list_objects
    headers=headers)
  File "C:\Users\gordo\Documents\Code\IRIS-HEP\ServiceX_frontend\.venv\lib\site-packages\minio\api.py", line 1984, in _url_open
    region = self._get_bucket_region(bucket_name)
  File "C:\Users\gordo\Documents\Code\IRIS-HEP\ServiceX_frontend\.venv\lib\site-packages\minio\api.py", line 1919, in _get_bucket_region
    region = self._get_bucket_location(bucket_name)
  File "C:\Users\gordo\Documents\Code\IRIS-HEP\ServiceX_frontend\.venv\lib\site-packages\minio\api.py", line 1957, in _get_bucket_location
    raise ResponseError(response, method, bucket_name).get_exception()
minio.error.NoSuchBucket: NoSuchBucket: message: The specified bucket does not exist.
gordonwatts commented 3 years ago

Also, see https://github.com/iris-hep/opendata-higgs-discovery/issues/9, whcih I think is the same thing.

gordonwatts commented 3 years ago

Turns out the reason for this is on_exception does not work for async iterators, though it does work fine for async functions!

gordonwatts commented 3 years ago

In fact, the way I've implemented on_exception makes no sense - you can't really do this - because items will have already been passed out; if you restart the stream then potentially some items will have passed - so in general, this does not work.

However - for us - what we are talking about here is the first item only - these exceptions will happen "early".

Still, I think we need to move where we have the exceptions happen - so a more reasonable place that does the re-try.