hapi-server / verifier-nodejs

HAPI Server Verifier using node.js and JSON Schema
MIT License
1 stars 1 forks source link

Consider regular expression for --dataset DATASETID #66

Open jbfaden opened 4 months ago

jbfaden commented 4 months ago

It would be useful for me to be able to run suites of tests on parts of a server, maybe by specifying a regular expression for the --dataset argument. For example, testing the new CDAWeb HAPI server there are many Barrel mission CDF files, which are all very similar and it would be nice to exclude them from the test (with ^(?!BAR).+ for example).

berniegsfc commented 4 months ago

I agree this is useful and the cdasws supports regex for many "dataset search" query parameters. But implementing it is difficult if you defend against ReDoS. I had to protect the cdasws from ReDoS years ago. It never caused a problem until after a recent upgrade when many (simple) requests involving client-supplied regex values were rejected as being "too complex". I never understood what happened but it hasn't happened again (even though the same regex request is sent 100s/1000s of times/day). Sometimes I wish I never implemented regex support nearly 20 years ago.

berniegsfc commented 4 months ago

I didn't look closely at the project for this issue. I thought you were suggesting adding regex support to the hapi specification. ReDoS is less of a concern for the verifier. So, ignore my previous comment.

jbfaden commented 4 months ago

That's good to know, Bernie. I was thinking of it in terms of running it locally, but I can see that with the server mode this would be a volnerability. So maybe a constrained set of regular expressions, or maybe wildcards (--exclude=BAR*)?

(The validator has spent over an hour and it is still going through BAR*... I hope this isn't causing any problems for you, Bernie.)

jbfaden commented 4 months ago

I see the --help option shows that "^regex" is supported, so it's really more of a documentation issue. I'd like to try:

node ~/temp/verifier-nodejs/verify.js --url http://localhost:8080/HapiServer/hapi --dataset='^BAR.*'

but this doesn't seem to work. Also, how would I use a carot in the regex, is it just:

node ~/temp/verifier-nodejs/verify.js --url http://localhost:8080/HapiServer/hapi --dataset='^^BAR.*'
rweigel commented 4 months ago

Not sure what the issue is. These work:

node verify.js --url 'http://hapi-server.org/servers/TestData2.0/hapi' --dataset='^dataset[1-2]'
node verify.js --url 'http://hapi-server.org/servers/TestData2.0/hapi' --dataset='^dataset.*'

To debug, put console.log(datasets); process.exit() in place of line 1420.

I'll add some examples to the docs in the future.

Regarding escaping, the code is very simple: https://github.com/hapi-server/verifier-nodejs/blob/master/tests.js#L1399. So I'd experiment in Chrome Debugger, for example, re = new RegExp('^a\\^bc'); re.test('a^bc') to figure out the regular expression to pass on the command line.

I'd write a script for more complex use cases that generates a sequence of node verify.js commands.

jbfaden commented 4 months ago

Should this mean test any dataset that starts with D:

node ~/temp/verifier-nodejs/verify.js --url https://cdaweb.gsfc.nasa.gov/hapi --dataset='^D.*' 

I was expecting it to do the D's. Is this not right?

rweigel commented 4 months ago

I would think so. Are you using the latest version? Try git pull; npm install.

But given you saw the regex option in the help, it seems you are using a version with the regex feature. If I do the `console.log(datasets); process.exit()`` in place of line 1420 and your command, I see

[
  { id: 'DE1_1MIN_RIMS' },
  { id: 'DE1_6SEC_MAGAGMS' },
  { id: 'DE1_PWI_LFC-SPECTRA' },
  { id: 'DE1_PWI_OR-AT' },
  { id: 'DE1_PWI_SFC-SPECTRA' },
  { id: 'DE2_62MS_VEFIMAGB@0' },
  { id: 'DE2_62MS_VEFIMAGB@1' },
  { id: 'DE2_AC500MS_VEFI' },
  { id: 'DE2_DCA500MS_VEFI' },
  { id: 'DE2_DUCT16MS_RPA@0' },
  { id: 'DE2_DUCT16MS_RPA@1' },
  { id: 'DE2_DUCT16MS_RPA@2' },
  { id: 'DE2_DUCT16MS_RPA@3' },
  { id: 'DE2_DUCT16MS_RPA@4' },
  { id: 'DE2_DUCT16MS_RPA@5' },
  { id: 'DE2_DUCT16MS_RPA@6' },
  { id: 'DE2_DUCT16MS_RPA@7' },
  { id: 'DE2_ION2S_RPA' },
  { id: 'DE2_NEUTRAL1S_NACS' },
  { id: 'DE2_NEUTRAL8S_FPI' },
  { id: 'DE2_PLASMA500MS_LANG' },
  { id: 'DE2_UA16S_ALL' },
  { id: 'DE2_VION250MS_IDM@0' },
  { id: 'DE2_VION250MS_IDM@1' },
  { id: 'DE2_VION250MS_IDM@2' },
  { id: 'DE2_WIND2S_WATS' },
  { id: 'DE_UV_SAI' },
  { id: 'DE_VS_EICS' },
  { id: 'DMSP-F13_SSJ_PRECIPITATING-ELECTRONS-IONS' },
  { id: 'DMSP-F16_SSIES-3_THERMAL-PLASMA' },
  { id: 'DMSP-F16_SSJ_PRECIPITATING-ELECTRONS-IONS' },
  { id: 'DMSP-F16_SSM_MAGNETOMETER' },
  { id: 'DMSP-F17_SSIES-3_THERMAL-PLASMA' },
  { id: 'DMSP-F17_SSJ_PRECIPITATING-ELECTRONS-IONS' },
  { id: 'DMSP-F17_SSM_MAGNETOMETER' },
  { id: 'DMSP-F18_SSIES-3_THERMAL-PLASMA' },
  { id: 'DMSP-F18_SSJ_PRECIPITATING-ELECTRONS-IONS' },
  { id: 'DMSP-F18_SSM_MAGNETOMETER' },
  { id: 'DN_K0_GBAY' },
  { id: 'DN_K0_HANK' },
  { id: 'DN_K0_ICEW' },
  { id: 'DN_K0_KAPU' },
  { id: 'DN_K0_PACE' },
  { id: 'DN_K0_PYKK' },
  { id: 'DN_K0_SASK' },
  { id: 'DSCOVR_AT_DEF' },
  { id: 'DSCOVR_AT_PRE' },
  { id: 'DSCOVR_H0_MAG' },
  { id: 'DSCOVR_H1_FC' },
  { id: 'DSCOVR_ORBIT_PRE' },
  { id: 'DYNAMO-2_DESA_NX02A-ESA-FLUX' }
jbfaden commented 4 months ago

I should have started by pulling the latest code. This is working for me now. Do you know if I can do --dataset='^^BAR.*' to exclude all the BAR ones? (It doesn't seem to work for me.)

rweigel commented 4 months ago

Probably. I'd try the hints at https://stackoverflow.com/questions/1538512/how-can-i-invert-a-regular-expression-in-javascript and test in https://regex101.com/ (make sure to select Javascript on left). Perhaps ^(?!BAR)(.*).