mmecina / CCS

The UVIE Space Central Checkout System (CCS) and Test Specification Tool (TST)
Mozilla Public License 2.0
1 stars 0 forks source link

Missing TimeStamp in poolviewer when Time is Synchronized #5

Closed pasetti closed 2 months ago

pasetti commented 4 months ago

I have a tmpool where I see the following behaviour:

Any idea of what might be going wrong? I tried to debug this issue but could find the function which computes the timestamp which appears in the poolviewer. Any tips on how to proceed are welcome ...

mmecina commented 4 months ago

The timestamp shown in the poolviewer Time column is created by the function cuc_time_str in ccs_funciton_lib.py. Either one of the other header parameters has an invalid value or an exception causes the blank timestamp.

pasetti commented 4 months ago

Either one of the other header parameters has an invalid value or an exception causes the blank timestamp.

There is indeed an error in the definition of the TSYNC_FLAG data structure in the COMETINTERCEPTOR configuration file. This probably explains the behaviour I described in my ticket. However, when I tried to fix this issue, I found that I no longer can load the tmpool with the synchronized time. Here is what I do:

What is happening?

mmecina commented 4 months ago

The warnings would suggest that there are no entries in the SQL storage associated with that filename. Have you tried loading the pool with the Force DB import option enabled in the file dialog?

pasetti commented 4 months ago

Have you tried loading the pool with the Force DB import option enabled in the file dialog?

I tried this but it does not help. I have then entered again: "make confignator" and "make databases" but the only result I achieved is that now none of my tmpool can be loaded in the pooviewer! In all cases, irrespective of which tmpool I try to load, I get the same three error messages reported above. Here is an example again:

2024-02-15 15:59:23,592: cfl             WARNING  No instance of poolmanager found.
2024-02-15 15:59:37,355: Poolviewer      WARNING  Could not find rows for pool /home/ap/Downloads/tmpool_EmFvSgfFeb24_BasicBsw
2024-02-15 15:59:37,367: Poolviewer      WARNING  Could not find rows for pool /home/ap/Downloads/tmpool_EmFvSgfFeb24_BasicBsw
2024-02-15 15:59:37,372: Poolviewer      WARNING  Could not find rows for pool /home/ap/Downloads/tmpool_EmFvSgfFeb24_BasicBsw
2024-02-15 15:59:53,642: Poolviewer      WARNING  Could not find rows for pool /home/ap/Downloads/tmpool_EmFvSgfFeb24_PltfEvts
2024-02-15 15:59:53,650: Poolviewer      WARNING  Could not find rows for pool /home/ap/Downloads/tmpool_EmFvSgfFeb24_PltfEvts
2024-02-15 15:59:53,653: Poolviewer      WARNING  Could not find rows for pool /home/ap/Downloads/tmpool_EmFvSgfFeb24_PltfEvts

Note that the loading process worked without problems until yesterday ...

pasetti commented 4 months ago

I would like to add two pieces of information to my previous post:

Hence, as far as I can see, I should be in a position where all the 'memory' of the system has been erased and where I am where I was yesterday. And yet none of my tmpools can be imported... and I always get the same error message reported above ...

mmecina commented 4 months ago

Can you please try the following code on one of the tmpool files?

fn='/path/to/tmpool'

# parse PUS packets from file
with open(fn ,'rb') as buf:
    pkts = cfl.extract_pus(buf)

# CRC
for p in pkts:
    if cfl.crc(p):
        print('CRC error:', p)

# create (TM/TCHeader object, payload data, CRC bytes) tuple for each packet
proc = [cfl.unpack_pus(x) for x in pkts]

# calculate timestamp for each packet
for p in proc:
    print(cfl.cuc_time_str(p[0]))

Does it fail somewhere or does the parsed data look wrong?

pasetti commented 4 months ago

I added import ccs_function_lib as cfl to the code proposed in the previous post and then ran it on one of the offending tmpools and the resulting output is attached. There are CRC errors at every HK report but, for the rest, it looks reasonable. Note that the CRC errors are a known issue in the software of our instrument and decoding of HK data does not work owing to inconsistencies between the MIB tables and the on-board software. Until yesterday, neither of these problems precluded successful importing of the tmpool.

Test.txt

mmecina commented 4 months ago

From the current commit, it looks like the culprit is a debug print in ccs_function_lib.py https://github.com/mmecina/CCS/blob/e8513dc396134efa181f4af93dab7c4a60665204/Ccs/ccs_function_lib.py#L6197

The variable timestamp is not defined in this context which causes the loading thread to fail.

pasetti commented 4 months ago

Many thanks for finding this!!!! I must have committed the debug statements by mistake while deleting the CoCa-specific files....

By the way, I suppose that the error in the code caused an exception and resulted in the thread being aborted. I do not recall seeing an error message in the console or in the log window. I presume that this is because the failure was in a background thread. Do you know if there is a simple way to have exceptions reported even if they occur in a background thread?

pasetti commented 4 months ago

I have removed the error kindly identified by Marko in his last post of 15 Feb but I am still unable to load any tmpool and I still get the same log messages as above, namely:

2024-02-15 15:59:23,592: cfl             WARNING  No instance of poolmanager found.
2024-02-15 15:59:37,355: Poolviewer      WARNING  Could not find rows for pool /home/ap/Downloads/tmpool_EmFvSgfFeb24_BasicBsw
2024-02-15 15:59:37,367: Poolviewer      WARNING  Could not find rows for pool /home/ap/Downloads/tmpool_EmFvSgfFeb24_BasicBsw
2024-02-15 15:59:37,372: Poolviewer      WARNING  Could not find rows for pool /home/ap/Downloads/tmpool_EmFvSgfFeb24_BasicBsw

I probably have some configuration mistake somewhere and I suppose I could solve the problem by re-installing the CCS but I would prefer to understand the problem because I had a similar issue in the past and because I would like to be prepared in case we have such issues in the future (e.g. if a tmpool is corrupted). But what I would need to know is: how do I investigate this type of issues? If there is an exception somewhere (maybe in a background thread), how do I see it?

mmecina commented 4 months ago

To have better visibility of unhandled exceptions/errors one can run the modules not as stand-alone processes, but in the CCS Editor terminal. For instance, if you suspect an error in the Poolviewer, start it by executing cfl.start_pv(console=True). This way any exception (even in threads) should be printed to the console.

pasetti commented 4 months ago

followed up on Marko's tip and started the poolviewer from the command line. This showed the problem was due to an exception at line 6274 in ccs_function_lib.py:

new_session.execute(DbTelemetry.__table__.insert(), pcktdicts)

For the reasons explained below, I no longer have the error message but I think it was something about the DB insert operation failing due to idx not having a default value.

I then continued investigating the problem by peppering the code with statements like: logger.warning("pcktcount: "+str(pcktcount)). I would have preferred to use the python debugger but, for some reason, this does not work when the application is started from the CCS console.

During debugging, I made a mistake in one of my logger statements and the program crashed in the middle of the loop just before line 6266 in ccs_function_lib.py. To my astonishment, afterwards, everything worked fine: I was able to import all tmpools normally ...

I cannot quite account for this behaviour but I suspect that the following happened:

My belief that the origin of the problem lies in the database is also supported by the following finding: earlier in the day, I had run exactly the same CCS code on a different machine (i.e. with a different database) using exactly the same CCS code, the same MIB Tables and the same tmpool and the problem reported above did not appear.

I know that all the above sounds somewhat far-fetched but I cannot think of any other explanation ... As far as I am concerned, the ticket can be closed but I leave it assigned to Marko so that he may read what I wrote above -- just in case he has some ideas as to what could be done to prevent this happening again in the future. Otherwise, all that is left to do is to thank Marko for his support.

mmecina commented 4 months ago

Just two more remarks:

pasetti commented 4 months ago

for debugging, most modules can also be run by executing the respective Python files directly, poolview_sql.py for the Poolviewer in the Ccs subfolder, for example; then the debugger should also work properly

Thanks. This is very useful information!

concerning database corruption it might be expedient to enclose SQL insertion code in try/except statements to, e.g., rollback transactions in case of failure. I will have a look where this would make sense.

This might be a good idea, indeed. But I need to mention one point: the anomaly I encountered and described in previous posts did not give rise to any SQL exception!

By the way, our CoCa supplier has just reported that he ran into the same problem I reported in previous posts: after making an update to the configuration file, he finds that he can no longer import tmpools. I cannot say whether this is the same problem I had but the symptoms look similar ...

pasetti commented 2 months ago

I have added the debugging tips provided in this ticket to the CoCa's ReadMe file. I therefore close the ticket.