Closed GoogleCodeExporter closed 9 years ago
Hi Lewis - I've made a pass through fixing up issues related to robots.txt
parsing, and I think the code is in good shape for use in Nutch (is Nutch
currently using this code?), as we now support patterns, the longest-match
heuristic, and the allow before disallow heuristic.
So a release might be in order, but wondering if you wanted to take a pass at
fixing deprecations first.
Original comment by kkrugler...@transpac.com
on 14 Mar 2014 at 12:07
@Chris, yes I will submit a patch for deprecation and we can push a release.
Thanks for nudging this one.
p.s. yes we are using robotos.txt parsing code in Nutch. It works a treat :)
Original comment by lewis.mc...@gmail.com
on 16 Mar 2014 at 1:33
I am working on this patch. It also comprises CC-8 issue so I am making the
upgrade to httpclient as per Fuad's patch.
I have some failing tests locally after the upgrade and addressing other javac
warning so I'll work on this again tomorrow when I get a chance and submit a
patch.
Original comment by lewis.mc...@gmail.com
on 16 Mar 2014 at 7:36
OK folks here is a patch for this issue.
It is a rather confusing patch as it contains a few things. So to break it down
it comprises the following
* Slight code reformatting in pom.xml, removal of unused Hadoop log property,
removal of unused ant-eclipse-jvm1.2 plugin configuration
* Integration of issue CC-8 which now upgrades out httpclient API usage to
v4.2.6. Having reviewed this patch, I now feel that we have retained as much of
the existing functionality as possible however using the new API it seems like
a lot of change, it is not as bad as it initially seems.
* Remove all unused imports across the codebase
* Suppress all Javac warning's with the appropriate entires across the entire
codebase
* Changed access of CrawlerCommons.getVersion() to static.
* Reformatted ALL files mentioned above for better readability in IDE.
The patch attachment can be applied to trunk and passes all tests.
No new tests have been introduced to the codebase in this patch.
Original comment by lewis.mc...@gmail.com
on 16 Mar 2014 at 9:33
Attachments:
Merged as of r118.
The formatting changes are open for discussion - some people really prefer 4
spaces over 2 :)
Original comment by kkrugler...@transpac.com
on 17 Mar 2014 at 12:38
Guys, please remember to update CHANGES.txt prior to committing something. This
will make it easier to track changes from one version to the other.
Original comment by digitalpebble
on 19 Mar 2014 at 10:58
Hi Julien
+ 1
Update made to CHANGES.txt as of revision 119 in trunk.
Thanks
Original comment by lewis.mc...@gmail.com
on 19 Mar 2014 at 7:15
Original issue reported on code.google.com by
lewis.mc...@gmail.com
on 28 Jan 2013 at 2:52