webrecorder / browsertrix-crawler

Run a high-fidelity browser-based web archiving crawler in a single Docker container
https://crawler.docs.browsertrix.com
GNU Affero General Public License v3.0
663 stars 84 forks source link

Misleading error message #598

Closed rgaudin closed 5 months ago

rgaudin commented 5 months ago

This is for browsertrix-crawler 1.1.3

When specifying an incorrect (how?) --include param, the crawl refuses to start and complains about an Invalid seed:

❯ docker run -it ghcr.io/openzim/zimit:1.6.3 crawl --allowHashUrls --userAgent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.5783.199 Safari/537.36 Edg/110.0.1641.55" --useSitemap="https://jguitar.com/sitemap.xml" --url="https://jguitar.com/" --failOnFailedSeed

This works ; adding the following huge --include param does not:

--include="jguitar\.com\/(chordsearch\?chordsearch=(C|C#|Db|D|D#|Eb|E|F|F#|Gb|G|G#|Ab|A|A#|Bb|B)?(m|dim|%2B|sus2|sus4|7|m7|M7|mM7|dim7|%2B7|%2BM7|6|m6|6add9)?&labels=(finger|letter|tone)|chord?root=(C|C#|Db|D|D#|Eb|E|F|F#|Gb|G|G#|Ab|A|A#|Bb|B)&chord=(Major|Minor|Diminished|Augmented|Suspended+2nd|Suspended+4th|Major+Flat+5th|Minor+Sharp+5th|Minor+Double+Flat+5th|Suspended+4th+Sharp+5th|Suspended+2nd+Flat+5th|Suspended+2nd+Sharp+5th|7th|Minor+7th|Major+7th|Minor+Major+7th|Diminished+7th|Augmented+7th|Augmented+Major+7th|7th+Flat+5th|Major+7th+Flat+5th|Minor+7th+Flat+5th|Minor+Major+7th+Flat+5th|Minor+Major+7th+Double+Flat+5th|Minor+7th+Sharp+5th|Minor+Major+7th+Sharp+5th|7th+Flat+9th|6th|Minor+6th|6th+Flat+5th|6th+Add+9th|Minor+6th+Add+9th|9th|Minor+9th|Major+9th|Minor+Major+9th|9th+Flat+5th|Augmented+9th|9th+Suspended+4th|7th+Sharp+9th|7th+Sharp+9th+Flat+5th|Augmented+Major+9th|11th|Minor+11th|Major+11th|Minor+Major+11th|Major+Sharp+11th|13th|Minor+13th|Major+13th|Minor+Major+13th|7th+Suspended+2nd|Major+7th+Suspended+2nd|7th+Suspended+4th|Major+7th+Suspended+4th|7th+Suspended+2nd+Sharp+5th|7th+Suspended+4th+Sharp+5th|Major+7th+Suspended+4th+Sharp+5th|Suspended+2nd+Suspended+4th|7th+Suspended+2nd+Suspended+4th|Major+7th+Suspended+2nd+Suspended+4th|5th|Major+Add+9th)&bass=(C|C#|Db|D|D#|Eb|E|F|F#|Gb|G|G#|Ab|A|A#|Bb|B)&labels=(finger|letter|tone)&gaps=(0|1|2)&fingers=(2|3|4|5|6)&notes=(sharps|flats)(&page=(2|3|4))?|arpeggio?root=(C|C#|Db|D|D#|Eb|E|F|F#|Gb|G|G#|Ab|A|A#|Bb|B)&chord=(Major|Minor|Diminished|Augmented|Suspended+2nd|Suspended+4th|Major+Flat+5th|Minor+Sharp+5th|Minor+Double+Flat+5th|Suspended+4th+Sharp+5th|Suspended+2nd+Flat+5th|Suspended+2nd+Sharp+5th|7th|Minor+7th|Major+7th|Minor+Major+7th|Diminished+7th|Augmented+7th|Augmented+Major+7th|7th+Flat+5th|Major+7th+Flat+5th|Minor+7th+Flat+5th|Minor+Major+7th+Flat+5th|Minor+Major+7th+Double+Flat+5th|Minor+7th+Sharp+5th|Minor+Major+7th+Sharp+5th|7th+Flat+9th|6th|Minor+6th|6th+Flat+5th|6th+Add+9th|Minor+6th+Add+9th|9th|Minor+9th|Major+9th|Minor+Major+9th|9th+Flat+5th|Augmented+9th|9th+Suspended+4th|7th+Sharp+9th|7th+Sharp+9th+Flat+5th|Augmented+Major+9th|11th|Minor+11th|Major+11th|Minor+Major+11th|Major+Sharp+11th|13th|Minor+13th|Major+13th|Minor+Major+13th|7th+Suspended+2nd|Major+7th+Suspended+2nd|7th+Suspended+4th|Major+7th+Suspended+4th|7th+Suspended+2nd+Sharp+5th|7th+Suspended+4th+Sharp+5th|Major+7th+Suspended+4th+Sharp+5th|Suspended+2nd+Suspended+4th|7th+Suspended+2nd+Suspended+4th|Major+7th+Suspended+2nd+Suspended+4th|5th|Major+Add+9th)&fret=(1[0-8]|[1-9])&labels=(none|letter|tone)&notes=(sharps|flats)|chordname|chordlisting?chord=(Major|Minor|Diminished|Augmented|Suspended+2nd|Suspended+4th|Major+Flat+5th|Minor+Sharp+5th|Minor+Double+Flat+5th|Suspended+4th+Sharp+5th|Suspended+2nd+Flat+5th|Suspended+2nd+Sharp+5th|7th|Minor+7th|Major+7th|Minor+Major+7th|Diminished+7th|Augmented+7th|Augmented+Major+7th|7th+Flat+5th|Major+7th+Flat+5th|Minor+7th+Flat+5th|Minor+Major+7th+Flat+5th|Minor+Major+7th+Double+Flat+5th|Minor+7th+Sharp+5th|Minor+Major+7th+Sharp+5th|7th+Flat+9th|6th|Minor+6th|6th+Flat+5th|6th+Add+9th|Minor+6th+Add+9th|9th|Minor+9th|Major+9th|Minor+Major+9th|9th+Flat+5th|Augmented+9th|9th+Suspended+4th|7th+Sharp+9th|7th+Sharp+9th+Flat+5th|Augmented+Major+9th|11th|Minor+11th|Major+11th|Minor+Major+11th|Major+Sharp+11th|13th|Minor+13th|Major+13th|Minor+Major+13th|7th+Suspended+2nd|Major+7th+Suspended+2nd|7th+Suspended+4th|Major+7th+Suspended+4th|7th+Suspended+2nd+Sharp+5th|7th+Suspended+4th+Sharp+5th|Major+7th+Suspended+4th+Sharp+5th|Suspended+2nd+Suspended+4th|7th+Suspended+2nd+Suspended+4th|Major+7th+Suspended+2nd+Suspended+4th|5th|Major+Add+9th)|scale?root=(C|C#|Db|D|D#|Eb|E|F|F#|Gb|G|G#|Ab|A|A#|Bb|B)&scale=(Ionian|Dorian|Phrygian|Lydian|Mixolydian|Aeolian|Locrian|Melodic+Minor|Phrygian+%236|Lydian+Augmented|Lydian+Dominant|Fifth+Mode|Locrian+%232|Altered|Whole+Tone|Diminished+Whole+Half|Diminished+Half+Whole|Major+Pentatonic|Minor+Pentatonic|Suspended+Pentatonic|Dominant+Pentatonic|Traditional+Japanese+in+sen|Blues|Bebop+Major|Bebop+Minor|Bebop+Dominant|Bebop+Melodic+Minor|Harmonic+Major|Harmonic+Minor|Double+Harmonic+Major|Hungarian+Gypsy|Hungarian+Major|Phrygian+Dominant|Neapolitan+Minor|Neapolitan+Major|Enigmatic|Eight-tone+Spanish|Balinese+Pelog|Oriental|Iwato|Yo|Prometheus|Symmetrical|Major+Locrian|Chromatic|Augmented|Lydian+Minor)&fret=(1[0-8]|[1-9])&labels=(none|letter|tone)&notes=(sharps|flats)|scaledictionary.jsp|scalelisting?scale=(Ionian|Dorian|Phrygian|Lydian|Mixolydian|Aeolian|Locrian|Melodic+Minor|Phrygian+%236|Lydian+Augmented|Lydian+Dominant|Fifth+Mode|Locrian+%232|Altered|Whole+Tone|Diminished+Whole+Half|Diminished+Half+Whole|Major+Pentatonic|Minor+Pentatonic|Suspended+Pentatonic|Dominant+Pentatonic|Traditional+Japanese+in+sen|Blues|Bebop+Major|Bebop+Minor|Bebop+Dominant|Bebop+Melodic+Minor|Harmonic+Major|Harmonic+Minor|Double+Harmonic+Major|Hungarian+Gypsy|Hungarian+Major|Phrygian+Dominant|Neapolitan+Minor|Neapolitan+Major|Enigmatic|Eight-tone+Spanish|Balinese+Pelog|Oriental|Iwato|Yo|Prometheus|Symmetrical|Major+Locrian|Chromatic|Augmented|Lydian+Minor)|harmonizer|harmonizer\/chord2scale|harmonizer\/chord2scale?root=(C|C#|Db|D|D#|Eb|E|F|F#|Gb|G|G#|Ab|A|A#|Bb|B)&(chord=|chord=m)(?:&chordlist=(?<dup>(C|C#|Db|D|D#|Eb|E|F|F#|Gb|G|G#|Ab|A|A#|Bb|B)(m|dim|%2B|sus2|sus4|Mb5|m%235|mbb5|sus4%235|sus2b5|sus2%235|7|m7|M7|mM7|dim7|%2B7|%2BM7|7b5|M7b5|m7b5|mM7b5|mM7bb5|m7%235|mM7%235|7b9|6|m6|6b5|6add9|m6add9|9|m9|M9|mM9|9b5|%2B9|9sus4|7%239|7%239b5|%2BM9|11|m11|M11|mM11|M%2311|13|m13|M13|mM13|7sus2|M7sus2|7sus4|M7sus4|7sus2%235|7sus4%235|M7sus4%235|sus2sus4|7sus2sus4|M7sus2sus4|5|add9)?\+)(?2024-05-01 2024-05-07&chordlist=\k<dup>)){0,300} |harmonizer\/chord2scale?(?:chordlist=(?<dup>C|C#|Db|D|D#|Eb|E|F|F#|Gb|G|G#|Ab|A|A#|Bb|B) (%2|m%2|dim%2|%2B%2|sus2%2|sus4%2|Mb5%2|m%235%2|mbb5%2|sus4%235%2|sus2b5%2|sus2%235%2|7%2|m7%2|M7%2|mM7%2|dim7%2|%2B7%2|%2BM7%2|7b5%2|M7b5%2|m7b5%2|mM7b5%2|mM7bb5%2|m7%235%2|mM7%235%2|7b9%2|6%2|m6%2|6b5%2|6add9%2|m6add9%2|9%2|m9%2|M9%2|mM9%2|9b5%2|%2B9%2|9sus4%2|7%239%2|7%239b5%2|%2BM9%2|11%2|m11%2|M11%2|mM11%2|M%2311%2|13%2|m13%2|M13%2|mM13%2|7sus2%2|M7sus2%2|7sus4%2|M7sus4%2|7sus2%235%2|7sus4%235%2|M7sus4%235%2|sus2sus4%2|7sus2sus4%2|M7sus2sus4%2|5%2|add9)?)(?2024-05-01 2024-05-07&chordlist=\k<dup>)){0,300}|harmonizer\/scale2chord|harmonizer\/scale2chord\?root=(C|C#|Db|D|D#|Eb|E|F|F#|Gb|G|G#|Ab|A|A#|Bb|B)&scale=(Ionian|Dorian|Phrygian|Lydian|Mixolydian|Aeolian|Locrian|Melodic+Minor|Phrygian+%236|Lydian+Augmented|Lydian+Dominant|Fifth+Mode|Locrian+%232|Altered|Whole+Tone|Diminished+Whole+Half|Diminished+Half+Whole|Major+Pentatonic|Minor+Pentatonic|Suspended+Pentatonic|Dominant+Pentatonic|Traditional+Japanese+in+sen|Blues|Bebop+Major|Bebop+Minor|Bebop+Dominant|Bebop+Melodic+Minor|Harmonic+Major|Harmonic+Minor|Double+Harmonic+Major|Hungarian+Gypsy|Hungarian+Major|Phrygian+Dominant|Neapolitan+Minor|Neapolitan+Major|Enigmatic|Eight-tone+Spanish|Balinese+Pelog|Oriental|Iwato|Yo|Prometheus|Symmetrical|Major+Locrian|Chromatic|Augmented|Lydian+Minor)&scalelist=(C|C#|Db|D|D#|Eb|E|F|F#|Gb|G|G#|Ab|A|A#|Bb|B)+(Ionian|Dorian|Phrygian|Lydian|Mixolydian|Aeolian|Locrian|Melodic+Minor|Phrygian+%236|Lydian+Augmented|Lydian+Dominant|Fifth+Mode|Locrian+%232|Altered|Whole+Tone|Diminished+Whole+Half|Diminished+Half+Whole|Major+Pentatonic|Minor+Pentatonic|Suspended+Pentatonic|Dominant+Pentatonic|Traditional+Japanese+in+sen|Blues|Bebop+Major|Bebop+Minor|Bebop+Dominant|Bebop+Melodic+Minor|Harmonic+Major|Harmonic+Minor|Double+Harmonic+Major|Hungarian+Gypsy|Hungarian+Major|Phrygian+Dominant|Neapolitan+Minor|Neapolitan+Major|Enigmatic|Eight-tone+Spanish|Balinese+Pelog|Oriental|Iwato|Yo|Prometheus|Symmetrical|Major+Locrian|Chromatic|Augmented|Lydian+Minor)&scalelist=(C|C#|Db|D|D#|Eb|E|F|F#|Gb|G|G#|Ab|A|A#|Bb|B)+(Ionian|Dorian|Phrygian|Lydian|Mixolydian|Aeolian|Locrian|Melodic+Minor|Phrygian+%236|Lydian+Augmented|Lydian+Dominant|Fifth+Mode|Locrian+%232|Altered|Whole+Tone|Diminished+Whole+Half|Diminished+Half+Whole|Major+Pentatonic|Minor+Pentatonic|Suspended+Pentatonic|Dominant+Pentatonic|Traditional+Japanese+in+sen|Blues|Bebop+Major|Bebop+Minor|Bebop+Dominant|Bebop+Melodic+Minor|Harmonic+Major|Harmonic+Minor|Double+Harmonic+Major|Hungarian+Gypsy|Hungarian+Major|Phrygian+Dominant|Neapolitan+Minor|Neapolitan+Major|Enigmatic|Eight-tone+Spanish|Balinese+Pelog|Oriental|Iwato|Yo|Prometheus|Symmetrical|Major+Locrian|Chromatic|Augmented|Lydian+Minor)|https://jguitar.com/instrument?instrument=(Guitar|Bass|mandolin|Ukulele|custom)&tuning=&strings=&frets=(?:&hand=left)?&capo=(0-22])&fretSpan=[3-8]|https://jguitar.com/instrument?instrument=custom&tuning=&strings=[2-8]&frets=[6-32]&capo=[0-6]&fretSpan=[3-8]|instrument|tuning|scalelisting?scale=(Ionian|Dorian|Phrygian|Lydian|Mixolydian|Aeolian|Locrian|Melodic+Minor|Phrygian+%236|Lydian+Augmented|Lydian+Dominant|Fifth+Mode|Locrian+%23|Altered|Whole+Tone|Diminished+Whole+Half|Diminished+Half+Whole|Major+Pentatonic|Minor+Pentatonic|Suspended+Pentatonic|Dominant+Pentatonic|Traditional+Japanese+in+sen|Blues|Bebop+Major|Bebop+Minor|Bebop+Dominant|Bebop+Melodic+Minor|Harmonic+Major|Harmonic+Minor|Double+Harmonic+Major|Hungarian+Gypsy|Hungarian+Major|Phrygian+Dominant|Neapolitan+Minor|Neapolitan+Major|Enigmatic|Eight-tone+Spanish|Balinese+Pelog|Oriental|Iwato|Yo|Prometheus|Symmetrical|Major+Locrian|Chromatic|Augmented|Lydian+Minor) |scale\/(C|C#|Db|D|D#|Eb|E|F|F#|Gb|G|G#|Ab|A|A#|Bb|B)\/(Ionian|Dorian|Phrygian|Lydian|Mixolydian|Aeolian|Locrian|Melodic+Minor|Phrygian+%236|Lydian+Augmented|Lydian+Dominant|Fifth+Mode|Locrian+%232|Altered|Whole+Tone|Diminished+Whole+Half|Diminished+Half+Whole|Major+Pentatonic|Minor+Pentatonic|Suspended+Pentatonic|Dominant+Pentatonic|Traditional+Japanese+in+sen|Blues|Bebop+Major|Bebop+Minor|Bebop+Dominant|Bebop+Melodic+Minor|Harmonic+Major|Harmonic+Minor|Double+Harmonic+Major|Hungarian+Gypsy|Hungarian+Major|Phrygian+Dominant|Neapolitan+Minor|Neapolitan+Major|Enigmatic|Eight-tone+Spanish|Balinese+Pelog|Oriental|Iwato|Yo|Prometheus|Symmetrical|Major+Locrian|Chromatic|Augmented|Lydian+Minor)|tabmap|rhymingdictionary)"

This results in:

{"timestamp":"2024-06-12T10:53:36.520Z","logLevel":"fatal","context":"general","message":"Invalid seed specified, aborting crawl. Quitting","details":{"url":"https://jguitar.com/"}}

I would expect a different error message as the mentioned url is valid. From what I understand, it's the combination of the seedUrl, include and exclude that makes it invalid: haven't looked in detail but I suppose it's not including the seed one.

ikreymer commented 5 months ago

That very long (!) regex doesn't appear to be parsable by JS. The include scope is part of the seed. Can add another error message that will say that the regex is invalid.

rgaudin commented 5 months ago

Would be useful 👍