Open bwbroersma opened 1 year ago
I'll find out which new content labels we need.
https://github.com/DigitalTrustCenter/sectxt/issues/65 is a blocker for this
Content still needs to be checked: all labels in https://github.com/DigitalTrustCenter/sectxt/ readme need to be in our content too.
Crappy one-liner check (formatted on 3 lines for readability :sweat_smile:):
$ diff \
<(grep -oP '"\K[a-z0-9]+_[a-z0-9_]+(?=")' sectxt/sectxt/__init__.py | sort -u) \
<(ls internet.nl_content/detail/tech/data/http-securitytxt/ | sed 's/_..\.md$//g' | sort -u)
1d0
< bom_in_file
5,6c4
< field_name
< invalid_cert
---
> expired
12c10
< invalid_uri_scheme
---
> location
26c24,25
< no_security_txt
---
> no_security_txt_404
> no_security_txt_other
31d29
< pgp_envelope
33a32,33
> requested-from
> retrieved-from
35a36
> utf8
At least for sure currently these are missing:
bom_in_file
invalid_cert
invalid_uri_scheme
pgp_envelope
At a manual inspection of sectxt I however see that invalid_uri_scheme
and bom_in_file
are in the SecurityTXT
class, not in the Parser
class that internet.nl uses. I'm don't see why bom_in_file
is not checked in the Parser
class.
Created issue upstream:
Upstream solved it in the 0.9.3 release.
Although this is in milestone v1.9, it is already included and deployed in the 'batch' release v1.8.7.
DigitalTrustCenter/sectxt released 0.9.0 with has quite a few parser improvements, especially on PGP.
The only one I'm not sure about is the stripping of the BOM (https://github.com/DigitalTrustCenter/sectxt/issues/57#issuecomment-1663592300). I interpret the RFC 9116 - File Format Description and ABNF Grammar:
RFC 5198 states:
Especially in combination with signing maybe a :warning: warning or :information_source: notice should be shown. Although it's outside of the PGP block, a file with BOM is no longer recognized with
file
in Linux as a PGP signed file.