radiorabe / suisa_sendemeldung

ACRCloud client for SUISA reporting @ RaBe.
https://radiorabe.github.io/suisa_sendemeldung/
MIT License
1 stars 2 forks source link

Keyerror #521

Open RollMediaDeveloper opened 1 month ago

RollMediaDeveloper commented 1 month ago

Tested containerized code with following script: sudo podman run --rm -ti -e BEARER_TOKEN=token -e STREAM_ID=s-xxxxxxxx ghcr.io/radiorabe/suisasendemeldung:latest suisa_sendemeldung --project-id=xxxxx --timezone=Europe/Xxxxxxxx --file --filetype=xlsx

Scritp loads results from ACRCloud, but gives the following error: File "/usr/local/bin/suisa_sendemeldung", line 8, in sys.exit(main()) ^^^^^^ File "/usr/local/lib/python3.11/site-packages/suisa_sendemeldung/suisa_sendemeldung.py", line 865, in main data = merge_duplicates(data) ^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/suisa_sendemeldung/suisa_sendemeldung.py", line 427, in merge_duplicates prev["metadata"]["played_duration"]


KeyError: 'played_duration'

What causes error?
hairmare commented 1 month ago

Hi @RollMediaDeveloper

Thanks for taking the time to post your issue here!

We seem to use the "custom stream" solution on ACRCloud and ours looks like this in the dashboard:

image

Are you also using the broadcast monitoring product?

I'll probably also need a JSON dump from ACRCloud to debug this further. Are you comfortable with tools like curl to export some raw data from ACRCloud?

A potential fix might be to ignore the "played_duration" field if it's not available. Let's figure out why you don't have it first and then proceed from there!

Cheers, Lucas

RollMediaDeveloper commented 1 month ago

Yes, we are using broadcast monitoring also. Attached is results exported with curl from one day; I don't see 'Played Duration' info there, but if I export results from Console, it is there. results080724.csv

hairmare commented 1 month ago

Thanks for the dump! I think i can narrow down the issue... Our bucket looks like this: image

it uses a "non-realtime for music" profile that is configured like so:

Config ID Name Length For Record Interval Record Noise Level Real-Time
1 non-realtime for music 10 0 Non-Record Antinoise-Low Non-Realtime

When i look you your JSON, it seems to be configured to use a real-time configuration.

{
  "data": [
    {
      "metadata": {
        "type": "real-time",
Config ID Name Length For Record Interval Record Noise Level Real-Time
4 realtime 10 3 Non-Record Antinoise-Low Realtime

As far as i understand it the real-time profile is meant for building things like a "now playing" feature for a website, while the "non real-time" profiles are for broadcast monitoring and reporting.

The "non real-time" profiles take a while until they return data, we usually get data some 10-20 minutes after we play a track. For our purpose the trade-off was that we get more concise reporting data from acrcloud.

Can you try if a "non real-time" configuration makes the played_duration field appear in your json dumps?

RollMediaDeveloper commented 1 month ago

results150724.csv Changed configuration to 'non-realtime', but I got same errors with script. However exported results with curl now show 'played_duration' info.

hairmare commented 4 weeks ago

I've not been able to reproduce the error with results150724.csv.

Maybe it doesn't work because the code tries to download results140724.csv at the same time. The code does this if the local timezone does not match UTC. Can you provide a file for 2024-07-14 to help me verify this?

Also, do you still get the error if you try exporting a later date?

edit: I generated an xlsx based on results150724.csv: rollfm_2024-07-16.xlsx.