pogzyb / asyncwhois

Python WHOIS and RDAP utility for querying and parsing information about Domains, IPv4s, IPv6s, and AS numbers
MIT License
63 stars 18 forks source link

Uncontrolled datetime format regex for FR #103

Closed antalgu closed 1 week ago

antalgu commented 1 week ago

The whois from klein-sujka.fr with find_authoritative_server = False:

"%%\n%% This is the AFNIC Whois server.\n%%\n%% complete date format: YYYY-MM-DDThh:mm:ssZ\n%%\n%% Rights restricted by copyright.\n%% See https://www.afnic.fr/en/domain-names-and-support/everything-there-is-to-know-about-domain-names/find-a-domain-name-or-a-holder-using-whois/\n%%\n%%\n\ndomain:                        klein-sujka.fr\r\nstatus:                        ACTIVE\r\neppstatus:                     active\r\nhold:                          NO\r\nholder-c:                      ANO00-FRNIC\r\nadmin-c:                       ANO00-FRNIC\r\ntech-c:                        UIS153-FRNIC\r\nregistrar:                     IONOS SE\r\nExpiry Date:                   2025-03-06T13:41:26Z\r\ncreated:                       2015-03-06T13:41:26Z\r\nlast-update:                   2024-04-30T22:04:20.78332Z\r\nsource:                        FRNIC\r\n\r\nnserver:                       ns1080.ui-dns.biz\r\nnserver:                       ns1080.ui-dns.com\r\nnserver:                       ns1080.ui-dns.de\r\nnserver:                       ns1080.ui-dns.org\r\nsource:                        FRNIC\r\n\r\nregistrar:
         IONOS SE\r\naddress:                       Ernst-Frey Strasse 9\r\naddress:                       76135 KARLSRUHE\r\ncountry:                       DE\r\nphone:
      +49.7219137450\r\nfax-no:                        +49.72191374215\r\ne-mail:                        hostmaster@1und1.de\r\nwebsite:                       https://ionos.com\r\nanonymous:                     No\r\nregistered:                    2001-01-15T00:00:00Z\r\nsource:                        FRNIC\r\n\r\nnic-hdl:                       UIS153-FRNIC\r\ntype:                          ORGANIZATION\r\ncontact:                       1&1 Internet SARL\r\naddress:                       1&1 Internet SARL\r\naddress:                       7, place de la Gare\r\naddress:                       57200 Sarreguemines\r\ncountry:                       FR\r\nphone:                         +33.970808911\r\nfax-no:                        +33.387959974\r\ne-mail:                        hostmaster@1and1.fr\r\nregistrar:                     IONOS SE\r\nchanged:                       2024-11-07T10:11:31.916078Z\r\nanonymous:
     NO\r\nobsoleted:                     NO\r\neppstatus:                     associated\r\neppstatus:                     active\r\neligstatus:                    not identified\r\nreachstatus:                   not identified\r\nsource:                        FRNIC\r\n\r\nnic-hdl:                       ANO00-FRNIC\r\ntype:                          PERSON\r\ncontact:                       Ano Nymous\r\nregistrar:                     IONOS SE\r\nanonymous:                     YES\r\nremarks:                       -------------- WARNING --------------\r\nremarks:                       While the registrar knows him/her,\r\nremarks:                       this person chose to restrict access\r\nremarks:                       to his/her personal data. So PLEASE,\r\nremarks:                       don't send emails to Ano Nymous. This\r\nremarks:                       address is bogus and there is no hope\r\nremarks:
       of a reply.\r\nremarks:                       -------------- WARNING --------------\r\nobsoleted:                     NO\r\neppstatus:                     associated\r\neppstatus:                     active\r\neligstatus:                    not identified\r\nreachstatus:                   not identified\r\nsource:                        FRNIC\r\n\r\nnic-hdl:
              ANO00-FRNIC\r\ntype:                          PERSON\r\ncontact:                       Ano Nymous\r\nregistrar:                     IONOS SE\r\nanonymous:
 YES\r\nremarks:                       -------------- WARNING --------------\r\nremarks:                       While the registrar knows him/her,\r\nremarks:                       this person chose to restrict access\r\nremarks:                       to his/her personal data. So PLEASE,\r\nremarks:                       don't send emails to Ano Nymous. This\r\nremarks:                       address is bogus and there is no hope\r\nremarks:                       of a reply.\r\nremarks:                       -------------- WARNING --------------\r\nobsoleted:                     NO\r\neppstatus:                     associated\r\neppstatus:                     active\r\neligstatus:                    ok\r\neligdate:                      2016-12-13T00:00:00Z\r\nreachstatus:                   not identified\r\nsource:                        FRNIC\r\n\n>>> Last update of WHOIS database: 2024-11-07T10:33:57.10162Z <<<\n\r\n"

Returns empty dates.

That is the case because the current regex "created: (\d{4}-\d{2}-\d{2})" tries to match this where there is only one space between created: and the date. Adding \s+ would allow matching one or more whitespace characters, solving this problem, so the regex would end up as:

"created:\s+(\d{4}-\d{2}-\d{2})" (and the same for the other dates)

However, this would ignore the time and only get the date. I don't know if this was a regex done to be able to capture the french dates when there were a bit more problems converting the format to datetime but if this was the case it could maybe be updated to:

"created: *(.+)"

As i've tested 6 other .fr webpages and they all return their dates in this format 2015-03-06T13:41:26Z, which will then correctly converted to a datetime. (this can also be done with last-update and Expiry date)

Also, on the same topic, I've noticed that this regex of selecting only the date and omitting the time was also done for KR, maybe it could also be applied there, but I don't have the time to do tests.

pogzyb commented 1 week ago

I made the changes for FR and it seems to be capturing the H:M:S information now. I couldn't replicate the same for KR, but made the change anyway. It looks like only the date is included in KR responses (with find_authoritative_server=False). For example, here's the output for google.kr:

query : google.kr

# KOREAN(UTF8)

도메인이름                  : google.kr
등록인                      : 구글코리아유한회사
등록인 주소                 : 서울시 강남구 역삼동 737 강남파이낸스센터 22층
등록인 우편번호             : 135984
책임자                      : Domain Administrator
책임자 전자우편             : dns-admin@google.com
책임자 전화번호             : 82.25319000
등록일                      : 2007. 03. 02.
최근 정보 변경일            : 2010. 10. 04.
사용 종료일                 : 2025. 03. 02.
정보공개여부                : Y
등록대행자                  : (주)후이즈(http://whois.co.kr)
DNSSEC                      : 미서명

1차 네임서버 정보
   호스트이름               : ns1.google.com

2차 네임서버 정보
   호스트이름               : ns2.google.com

네임서버 이름이 .kr이 아닌 경우는 IP주소가 보이지 않습니다.

# ENGLISH

Domain Name                 : google.kr
Registrant                  : Google Korea, LLC
Registrant Address          : 22nd Floor Gangnam Finance Center, 737 Yeoksam-dong Kangnam-ku Seoul
Registrant Zip Code         : 135984
Administrative Contact(AC)  : Domain Administrator
AC E-Mail                   : dns-admin@google.com
AC Phone Number             : 82.25319000
Registered Date             : 2007. 03. 02.
Last Updated Date           : 2010. 10. 04.
Expiration Date             : 2025. 03. 02.
Publishes                   : Y
Authorized Agency           : Whois Corp.(http://whois.co.kr)
DNSSEC                      : unsigned

Primary Name Server
   Host Name                : ns1.google.com

Secondary Name Server
   Host Name                : ns2.google.com

- KISA/KRNIC WHOIS Service -
antalgu commented 1 week ago

Nice, thanks for the fast response!