ulixee / hero

The web browser built for scraping
MIT License
698 stars 33 forks source link

PageStateGenerator #37

Closed blakebyrnes closed 2 years ago

blakebyrnes commented 2 years ago

This feature includes: 1) New TimeTravel module 2) Replay is moved into this project and renamed "TimetravelPlayer" 3) The internals of rebuilding a page from a SessionDb is called a MirrorPage/Network/Context. These are shared across TimeTravelPlayer, DetachedTab and PageStateGenerator 4) DomRebuilder creates a VirtualTree for DomNodes and keeps track of stats for how often classes and ids are reused. This feature is intended to be used for shortest-path query selectors, but has not yet been implemented

Currently in PageStateGenerator: 1) Detection of Added/Removed/Updated elements/text nodes 2) XPath by unique id 3) XPath by unique full path 4) XPath by text of fields with text < 200 characters (otherwise includes giant scripts) 5) Test each state in a MirrorPage 6) Generate unique state assertions and use given timeRanges

Not in PageStateGenerator yet (as of this PR): 1) Empty waitForPageState() flow client -> core 2) LoadFromFile feature to read generated asserts and use them in a real waitForPageState 3) Analyze Resources 4) Create closest path queries