ssutee / metageta

Automatically exported from code.google.com/p/metageta
Other
0 stars 0 forks source link

Enable Walk to detect additions to Crawled results #15

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Enable checking of crawl results (reference existing .xls) during walk 
process for new and modified datasets in target directory.

Results may be appended to the existing crawl result or created as a 
new .xls - so that changes are maintained - depending on the users 
requirements i.e. a checkbox 'append existing' or 'create new'

Original issue reported on code.google.com by simonaol...@gmail.com on 21 Feb 2010 at 11:10

GoogleCodeExporter commented 9 years ago
Implemented functionality to update xl spreadsheets.
This issue was updated by revision  r222

Original comment by pinner.luke@gmail.com on 9 Mar 2010 at 5:20

GoogleCodeExporter commented 9 years ago
Now to work that into runcrawler.py and come up with a brilliant scheme to 
determine
whether a dataset has changed...

Original comment by pinner.luke@gmail.com on 9 Mar 2010 at 5:30

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Simple implementation which only checks the file modification date. 
Maintains fields that have been manually modified or added to the spreadsheet.
Updates overviews if required.
Flags datasets from the original crawl results that can no longer be found on 
the
spreadsheet as "deleted". These are not transformed to XML.

I haven't implemented a checkbox to 'append existing' or 'create new'.

Original comment by pinner.luke@gmail.com on 31 Mar 2010 at 4:21