jtec / prx

MIT License
8 stars 6 forks source link

Parse Loss of Lock Indicator in RINEX OBS files #97

Closed plutonheaven closed 1 month ago

plutonheaven commented 2 months ago

This PR adds the parsing of the Loss of Lock Indicator (LLI) from the RINEX OBS file. It is added as a new column of the prx output.

Contrary to georinex, the prx parser keeps a LLI value to 0 when no CS is declared in the RINEX file.

By the way, I observed that georinex does not parse LLI for all available signals.

plutonheaven commented 2 months ago

@jtec, could you check if this modification does not slow too much the parsing of RINEX OBS files? (and post the command to test this here, if it's a few lines of code)

jtec commented 2 months ago

@plutonheaven Running https://github.com/jtec/prx/blob/eec1a6f86a72a3391b26eca119b75087008ec568/src/prx/rinex_obs/test/benchmark.py#L81 I get

On this branch

parse_lli

On main

main

So it looks like the parser got about 35% slower.

It looks like

https://github.com/jtec/prx/blob/eec1a6f86a72a3391b26eca119b75087008ec568/src/prx/rinex_obs/parser.py#L78

shares some DataFrame operations with

https://github.com/jtec/prx/blob/eec1a6f86a72a3391b26eca119b75087008ec568/src/prx/rinex_obs/parser.py#L64

so maybe there are some operations we only need to do once?

jtec commented 2 months ago

@plutonheaven We should look for ways to run these benchmarks automatically on each PR :)

jtec commented 2 months ago

This will close https://github.com/jtec/prx/issues/58!

plutonheaven commented 2 months ago

It looks like

https://github.com/jtec/prx/blob/eec1a6f86a72a3391b26eca119b75087008ec568/src/prx/rinex_obs/parser.py#L78

shares some DataFrame operations with

https://github.com/jtec/prx/blob/eec1a6f86a72a3391b26eca119b75087008ec568/src/prx/rinex_obs/parser.py#L64

so maybe there are some operations we only need to do once?

I tried to mutualize the processing by creating a single DataFrame with DataFrame.str.extract and regexp, but it resulted in 10% slower results 🤬

# creates a dataframe with 2 columns named "obs" and "lli"
df_extract = (
    df.records.str.split("|", expand=True).stack().str.extract(
        "(?P<obs>.{" + str(block_length - 2) + "})(?P<lli>.{1})")
)
df_extract.obs = df_extract.obs.str.strip().replace({"": "NaN"}).astype('float')
df_extract.lli = df_extract.lli.str.strip().replace({"": 0}).astype(int)
df_extract = df_extract.unstack()
df_extract["sv"] = df.sv
df_extract["constellation"] = df.sv.str[0]
df_extract["time"] = df.time

I'll leave it as it is for now... Might give it another try later...

plutonheaven commented 1 month ago

I don't have any leads to improve performance. I previously observed that georinex was really slowed down when parsing the LLI too, so that could make sense to lose some computational speed too. I'll merge the current version of this PR.