In uri_trainer, @article_scrape_patterns doesn't need to be instantiated - so a memoized scrape patterns for the trainer makes sense
raw_html is referred to as payload when I traced through the code - which do we want?
URI Trainer
Contains all the log statements
train iterates through the list of data types and fetches the pattern from a private method
Will save the resulting patterns
Pattern was in "presets selector", but that class didn't seem necessary and the method was actually just getting a pattern by calling some presets stuff. So I separated it to have a pattern here and call preset
Preset
preset.select will guide you through selecting from options and return nil if skip, and otherwise return the corresponding selected_option
options simply returns the list of possible options e.g.
1) og_descriptions: example
2) I will provide a pattern using xpath
3) I will provide a pattern using css
4) skip
transform_results is the results from the xpath/css patterns for use in options
Realized that results, which was originally used in this if/else
if preset_results.empty?
CLI.log("No presets were found for #{target_data_type}. Skipping to next.")
is actually not needed because it really just checks if data_type_presets is there and not empty... so now its private
What this does
uri_trainer
,@article_scrape_patterns
doesn't need to be instantiated - so a memoized scrape patterns for the trainer makes senseraw_html
is referred to aspayload
when I traced through the code - which do we want?URI Trainer
train
iterates through the list of data types and fetches the pattern from a private methodpreset
Preset
preset.select
will guide you through selecting fromoptions
and return nil if skip, and otherwise return the correspondingselected_option
options
simply returns the list of possible options e.g.transform_results
is the results from the xpath/css patterns for use inoptions
results
, which was originally used in this if/elseis actually not needed because it really just checks if
data_type_presets
is there and not empty... so now its private