webrecorder / pywb

Core Python Web Archiving Toolkit for replay and recording of web archives
https://pypi.python.org/pypi/pywb
GNU General Public License v3.0
1.34k stars 207 forks source link

fields param replace fl param on CDX Server API documentation needs review #542

Open igobranco opened 4 years ago

igobranco commented 4 years ago

Describe the bug

The fl param isn't filtering any CDXJ API entries.

Steps to reproduce the bug

Using 2.4.0-rc5 version can't filter the output of the CDXJ API.

Expected behavior

Currently getting https://m.preprod.arquivo.pt/wayback/cdx?output=json&url=fccn.pt&fl=timestamp&limit=3

Returns:

{"urlkey": "pt,fccn)/", "timestamp": "19961013145650", "status": "200", "url": "http://www.fccn.pt/", "filename": "AWP-Roteiro-20090510220155-00000.arc.gz", "length": "0", "mime": "text/html", "offset": "45198", "digest": "OWMAVER7CCNJWL2E5ZURDDKGCHWS7JJO", "source": "$root:gigantic_index_1_v2.cdxj", "source-coll": "$root"}
{"urlkey": "pt,fccn)/", "timestamp": "19971210202137", "status": "200", "url": "http://www.fccn.pt/", "filename": "PT-HISTORICAL-1997-GROUP-ABP-20100830000000-00000.arc.gz", "length": "0", "mime": "text/html", "offset": "11878742", "digest": "ZDBF3G73EW3UK6GIWTLDCIDKCAPCBFJ2", "source": "$root:gigantic_index_1_v2.cdxj", "source-coll": "$root"}
{"urlkey": "pt,fccn)/", "timestamp": "19971210202137", "url": "http://www.fccn.pt:80/", "mime": "text/html", "status": "200", "digest": "ZDBF3G73EW3UK6GIWTLDCIDKCAPCBFJ2", "length": "1084", "offset": "11878742", "filename": "PT-HISTORICAL-1997-GROUP-ABP-20100830000000-00000.arc.gz", "source": "$root:IA.cdxj", "source-coll": "$root"}

Where the expected result should be:

{"timestamp": "19961013145650"}
{"timestamp": "19971210202137"}
{"timestamp": "19971210202137"}
igobranco commented 4 years ago

After reviewing the code I've detected it works with 'fields' parameter instead of 'fl'.

https://github.com/webrecorder/pywb/blob/92e459bda52a2b03f33a4b0b8094ed424248d2a5/pywb/warcserver/index/query.py#L87

https://github.com/webrecorder/pywb/blob/92e459bda52a2b03f33a4b0b8094ed424248d2a5/pywb/warcserver/index/cdxops.py#L46

Nevertheless the documentation needs a review: cdxserver_api