Closed ikreymer closed 2 years ago
Thoughts on adding a name
field to easily label and differentiate configs?
Thoughts on adding a
name
field to easily label and differentiate configs?
Yes, definitely a good idea. Just free-form text that can be searched by, right?
@ikreymer initial pass at mockup: https://app.mockplus.com/run/rp/TgN5xi_FPJvV/2DkJibt0vw?cps=expand&rps=expand&nav=1&ha=1&la=1&fc=0&out=0&rt=1
q's:
seed
be user facing? or should the UI display seeds as something like "URL group"/"Page groups"?.config
object? IMO the former is more user-friendly if we expect users to download JSON configs and then upload or copy-paste them in the futureper discussion on call:
seed URL
is user-facing term (TODO link glossary), group seed configs in UI@ikreymer minor issue with the POST crawlconfigs endpoint: leaving out the trailing /
throws a Method Not Allowed
error on the server
Simplified config for now, just send URL list for seeds
, everything else moved to outer scope, eg.
{
"schedule": null,
"runNow": true,
"config": {
"seeds": [
"https://webrecorder.net/"
],
"scopeType": "prefix"
},
"name": "Example Name",
"colls": [],
"crawlTimeout": "0",
"parallel": 1
}
All done! Crawl scaling and tags to be separate issues.
This screen will produce a JSON that is then passed to the crawl config creation API endpoint.
The format includes a top-level dictionary with a Browsertrix Cloud-specific options, and a
config
dictionary, which corresponds to the Browsertrix Crawler config.The format is:
The key properties to include are:
The actual crawl configuration, the
config
property, can be what is passed to browsertrix-crawler can be either a:For the seed list, the input might be:
The supported properties in the 'simplified view' will likely continue to evolve, but also have the advanced view for pasting a custom config.