Closed sagelliv closed 8 years ago
Extraction should be run after a user updates field names during annotation. Make sure that the spider is reloaded on subsequent extraction requests. Some pages shouldn't have data extracted so it may be worth using the clustering to see if the current page is close in structure of any of the samples before attempting to repeatedly extract data so that users aren't left wondering why it is trying to extract data. Check that extracted links are being sent
A spinner appears when data is being extracted. Whenever the portia UI transitions into a route which contains the string
'spider'
, the extraction is triggered. If a page has no samples, the extraction will return without results. In the case in which the extraction is unsuccessful (ie. there exists a sample but we get 0 results from the extraction), the spinner will not disappear until ~4 minutes pass. We are willing to wait for 4 minutes until extraction returns a positive number of items, waiting for splash mutation on the html to render an html structure that might give use this positive extraction.