kernelci / kernelci-pipeline

Modular pipeline based on the KernelCI API
GNU Lesser General Public License v2.1
6 stars 19 forks source link

lava_callback.py: Sanitize lava log data #621

Closed nuclearcat closed 4 months ago

nuclearcat commented 4 months ago

As we use this data in reports, lets remove all non-printable characters as they confuse grafana, browsers and others.

JenySadadia commented 4 months ago

I thought we wanted to truncate NULL chars. This is replacing them with ? instead.

'log_excerpt': "+ lava-target-ip\n+ ./ssh_retry.sh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i /home/cros/.ssh/id_rsa root@192.168.201.19 poweroff\nWarning: Permanently added '192.168.201.19' (ED25519) to the list of known hosts.\n+ sleep 30\n+ ./tast_parser.py --results\n### BEGIN STDERR DUMP ###\n### END STDERR DUMP ###\n<LAVA_SIGNAL_TESTRAISE Tast tests run failed>\nCBFS: Found @ offset f40c0 size 47ab\n????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????"
nuclearcat commented 4 months ago

Unfortunately there is way more than just null characters. For example take a look at: https://staging.kernelci.org:9000/viewer?node_id=6656ec0f8b971c830cc01c07 (This one not filtered yet)

JenySadadia commented 4 months ago

Unfortunately there is way more than just null characters. For example take a look at: https://staging.kernelci.org:9000/viewer?node_id=6656ec0f8b971c830cc01c07 (This one not filtered yet)

Yes, but do you think replacing them with ? would solve the issue? Although this fix is not related to just KCIDB, from the KCIDB integration PoV, I'll be still getting validation error.

nuclearcat commented 4 months ago

I guess kcidb want to see only printable characters, isn't it? What exactly is issue of kcidb validation is not that?

JenySadadia commented 4 months ago

As per Helen's requirement, I need to send last 10 lines of log file. One issue is you are replacing new line char so it tries to accommodate everything in the log field now. Also, with so many \x00 chars, KCIDB was throwing max length exceed error for the field. The issue persisted with ? char now.

JenySadadia commented 4 months ago

One issue is you are replacing new line char so it tries to accommodate everything in the log field now.

Check the log file https://staging.kernelci.org:9000/viewer?node_id=6657cdda2b005d102de7e3a1. Now, it just has one line with all the logs.

nuclearcat commented 4 months ago

I don't think i can truncate also length of lines, it might truncate some useful data, if some tests will generate very long lines for example. Then we should do additional filtering in kcidb bridge according to grafana requirements.

JenySadadia commented 4 months ago

Can we just drop all the \x00 chars from the log file if it won't cause any issues?

nuclearcat commented 4 months ago

As i see log is nice now on staging?

nuclearcat commented 4 months ago

Thanks, merging