noyainrain / flatdir

Web aggregator of flat ads from different real estate companies.
MIT License
24 stars 4 forks source link

Enable text pattern for fields #7

Closed noyainrain closed 1 year ago

noyainrain commented 1 year ago

Some real estate companies output rather verbose field values, e.g. a location including the street address or a room count with flavor text. Make it possible to filter fields with a regular expression, given by path:pattern, e.g. p/span[1]:[^,]*.

Draft

## Real estate company.
##
## A *field* describes the extraction of data from a document and has the form
## `path:pattern:default`. *path* locates the target element. *pattern* is an optional regular
## expression to search the element for.
##
## Future reference: Optionally, a *default* value is used if the field is missing.
##[company:example.org]
## URL field of an ad
#url_path = a/@href
## Title field of an ad
#title_path = a/h2
## Location field of an ad
#location_path = p/span[1]:[^,]*
## Rooms field of an ad
#rooms_path = p/span[2]