nanoporetech / minknow_api

Protobuf and gRPC specifications for the MinKNOW API
Other
50 stars 12 forks source link

broken file in /data/persistence #33

Closed svennd closed 2 years ago

svennd commented 2 years ago

Hey,

I had a corrupt file in /data/persistence/3A/acquisitions/* this blocked my script that monitors runs, so I removed it; sadly now it fails cause the file is missing. We rebooted the device so I'm not sure how it remembers this run or how I can circumvent this. I don't have the original corruption error anymore. But currently my script fails with (only on this position) :

fails on this line :

# for pos.name == 3A
# this works
position_connection = pos.connect()
# here it fails -->
position_connection.protocol.list_protocol_runs()

the error :

INFO: connecting to pos : 3A
Traceback (most recent call last):
  File "get_runs.py", line 264, in <module>
    main()
  File "get_runs.py", line 261, in main
    create_acquisition_info(position_connection, run, pos)
  File "get_runs.py", line 115, in create_acquisition_info
    aq_info = connection.acquisition.get_acquisition_info(run_id=run)
  File "/usr/local/lib/python3.8/dist-packages/minknow_api/acquisition_service.py", line 523, in get_acquisition_info
    return run_with_retry(self._stub.get_acquisition_info,
  File "/usr/local/lib/python3.8/dist-packages/minknow_api/acquisition_service.py", line 74, in run_with_retry
    result = MessageWrapper(method(message, timeout=timeout), unwraps=unwraps)
  File "/usr/local/lib/python3.8/dist-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/usr/local/lib/python3.8/dist-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.NOT_FOUND
        details = "Error opening Acquisition Record file: File does not exist.
filename: /data/persistence/3A/acquisitions/08e5fbb53d8349ca8397a2de6eb567b818919780"
        debug_error_string = "{"created":"@1643103325.367472983","description":"Error received from peer ipv4:127.0.0.1:8048","file":"src/core/lib/surface/call.cc","file_line":1074,"grpc_message":"Error opening Acquisition Record file: File does not exist.\nfilename: /data/persistence/3A/acquisitions/08e5fbb53d8349ca8397a2de6eb567b818919780","grpc_status":5}"
>

I tried copying another file to the location, but then it fails on the fact that its not the right run_id. We restarted the both devices, so its stored somewhere. Can I somehow fix this ? For now, I just ignore this position but we can't track the runs any longer, which is a bit annoying.

Any help / ideas welcome !

0x55555555 commented 2 years ago

Hi @svennd ,

What run are you trying to retrieve? If you pass the run_id in for a run that has been deleted, I would expect an error to be generated.

If you have manually changed some of /data/persistance and don't care about data persistance, it may be simpler for you to fully remove that whole folder.

svennd commented 2 years ago

Hey George,

Thanks for the help. The run was finished and moved off the system. I try to track all the runs that are running, so I loop over every position to see what protocols have been run and keep that documented. The error is correct, I'm just looking for a way to resolve the issue of a file that was corrupt/missing. I tried to rename 3A directory and even /data/persistence, but the error remains. How can I make the api "forget" about this problematic run ?

The piece of code running : (although I don't expect this is relevant)

# Find a list of currently available sequencing positions.
    # available positions on the device
    for pos in positions:

        logging.info('connecting to pos : %s' % pos.name);

        # init connection
        position_connection = pos.connect()

        if pos.running:
            # get all the ran protocols
            position_protocol = position_connection.protocol.list_protocol_runs()
            for run in position_protocol.run_ids:
                create_run_info(position_connection, run, pos)

            # get all the acquisitions
            acquisition_list = position_connection.acquisition.list_acquisition_runs ()
            for run in acquisition_list.run_ids:
                create_acquisition_info(position_connection, run, pos)

Thanks for the help !

0x55555555 commented 2 years ago

How can I make the api "forget" about this problematic run ?

The easiest way is to delete the /data/persistance db. There is also api in protocol clear_protocol_history_data to remove protocols from the history.

You could also work around the problematic acquistions using a try/except block?

svennd commented 2 years ago

Hey George,

Yes, catching the error is the most clean solution... I will try that in the script for future issues.

    try:
       aq_info = connection.acquisition.get_acquisition_info(run_id=run)
    except:
       logging.info('broken persistence file')

I'm not sure if clear_protocol_history_data works for acquisition ? Also what file should I delete in /data/persistence to clear the entire history (or at-least this particular run) ? I can't find a .db file there ? (sorry for the stupid question)

Thanks !

0x55555555 commented 2 years ago

Hello,

You would need to delete the whole /data/persistence folder to wipe all cached runs and ensure the database is consistent.

clear_protocol_history_data only works for a protocol run id, so you would need to pass in the top level id.

svennd commented 2 years ago

Hey George,

Thanks for your help, I don't want to delete the entire directory for now, and i'm not smart enough to understand how to run clear_protocol_history_data so for now, ill just use the try/catch in my code, which just skips these faulty runs.

Thank you for your help!